Part 1 of Utilitarian Polyglot: What languages do you need in order to speak with the most people?
What?! Another series? Yes, this one's lighter reading than the previous stuff, and has nice charts! See it as a fascinating deep-dive into the mind of a utilitarian polyglot.
An interesting thing to do when you're learning languages is to estimate the amount of people in the world you could have a conversation with. For example, leosmith, the polyglot founder of OPLingo, can converse with a third of the human race (I have it on good authority that aliens have shortlisted him for abduction, brain-washing, and subsequent employment as an ambassador to humans... he had it coming).
But what if you're just starting on your journey as a polyglot, only see languages as tools, and want to adopt the most efficient approach to your language-learning career? Surely you would want to learn the languages spoken by the most people, but also capitalize on people's proficiency in other languages. For example, if you're going to Canada, you could say that speaking English mostly covers it, so there's little need to learn French. You would lose out on a lot of things, but if you see language-learning as a purely utilitarian endeavor then this would be a valid decision.
So, how will you go about deciding which languages to learn? Well, we're going to help you out on this journey. You're welcome.
A modest roadmap
To facilitate our task, we'll assume that you speak English. Data on English proficiency worldwide is relatively easy to find (or at least it exists): this is not necessarily the case for other languages, which is why we'll be taking English as our starting point.
Let's take a data-based approach here. If one speaks English, what languages should one learn in order to maximise the amount of people one can have a conversation with?
What we want to do is the following:
- Determine how many people speak English in each country
- Identify the other languages spoken in these countries
- Find the languages that will allow you to converse with the largest number of people while avoiding overlaps in your target populations
The end result should be pretty close to the straightforward approach of just picking the most spoken languages one after the other, but what is interesting is to what extent the two strategies will differ.
Before we dig in, it's worth saying that this analysis needs to be taken with a grain of salt. Why? The data this is based on comes from different sources, with varying degrees of reliability, and a lot of assumptions will be made about the languages spoken in each country if data is not available. Hopefully, citing the sources and being as transparent as possible about the design of this analysis will help us circumvent most issues. The good thing is that, if you want to change something, the code and data are available: if this is your thing, your contributions are welcome!
Today we'll just deal with the first bullet point. But watch this space.
Here we go!
A lot of people speak English already
One would have thought that nigh-exhaustive, robust data would be readily available for English proficiency across the world - linga franca and all that - but one would be wrong. The data might be hiding somewhere, but I did not find it.
The List of countries by English-speaking population on Wikipedia is the next best thing. It gives an estimate of the English-speaking population for 130 of the world's ~200 countries. The information is not the most up-to-date, proficiency is almost entirely self-reported, the sources cited for some countries are unconvincing, some numbers are questionable, and "English-speaking" does not have a standardized meaning, but... we can definitely work with this (trust me).
Some countries with very large populations were not included in the Wikipedia list, so I had to look around for other, sometimes anecdotal, sources. I've been able to add the most robust information to Wikipedia (adding to the sum of human knowledge always feels great). In 5 cases, actual sources were not available, so I had to resort to the wisdom of crowds: random comments on discussions boards, blog posts, the direction the wind was blowing that day, etc. This wouldn't do for Wikipedia, but (corroborated) anecdotal evidence is fair game in our case, given our purposes. The total population of the ~70 countries (including currently unrecognized ones) that we have no information for represents ~800 million people, or close to 10% of the total world population. Based on some quick and dirty checks, it seems safe to say that missing data indicates proficiency levels below 10-15%.
You will notice that the Wiki page already has a map. Why the hell have we been going through this then?! Well, the map is not accurate, as it doesn't actually represent the data the page is talking about.
You might inform me of the existence of the EF English Proficiency Index, but this is also not relevant to us. The EF EPI measures the English-language skills of those who have taken the test administed by a company that sells English training: the self-selection sampling bias is huge here, so generalizing the results to wider populations and making bold inferences would be very wrong, but the number of news outlets and even officials who misconstrue the EPI as a country's ability in English is staggering. But I digress.
Back to our utilitarian polyglot. According to our data, English would allow one to speak to 1.5 billion people, or 20% of the world population. This is a bit higher than other estimates I've seen, but it passes our undemanding sanity check. The pictogram below will be keeping track of our progress.
Below is the map that we get from our data. The interactive version is attached and also available here, if you want to play around. Pretty neat!
A few points of interest:
- The Scandinavians are smashing it, as always.
- Those living in developing countries and in countries with very large populations are less likely to speak English.
- Most of the countries we're missing data on are in the Global South (expected), with some of the richest countries also in the mix (unexpected).
- The Japan and South Korea phenomenon: two of the richest countries, firmly embedded in the global economy, have surprisingly low levels of English-speakers. Some describe it as a cultural pushback, some as stubbornness, but maybe someone with first-hand experience can weigh in here.
What do you guys think?