You are invited to check the last study of the Observatory of Linguistic and Cultural Diversity on the Internet about the presence of languages on the Internet. The methodology, data sources, products and biases of the study are documented in English, French, Portuguese and Spanish.
The internationalization of the Internet is well advanced and its center of gravity is moving away from the Occidental countries. The percentage of the Web in English was measured at 50% in 2007, 30% in 2017 and now it reached 25%.
However, the media keep projecting figures above 50% for English, supported by W3Techs daily observations. How can this source be so wrong? Its multilingualism stupid! paraphrasing this famous expression and pushing the idea that the economic engine of the web is now multilingualism rather than any particular language…
The lack of consideration of multilingualism derives in huge errors when you apply language recognition algorithm focusing only home pages and compute percentages on world population instead of speaker’s population (following Ethnologue source L1+L2/L1 = 1.43, see note below).
How come English percentage of the Web would have kept stable at 50% during the last 14 years while the Internet has been changing radically its demography and the number of connected English speakers (L1+L2) has made a relative decrease from 32% of the total of connected persons in 2007 to only 13% today? English was indeed measured around 50% of Web contents between 2007 and 2009 (see here), but since the exponential growth of Chinese, Hindi, Arabic, Turkish, Bengali, Vietnamese, Urdu, Persian and Marathi, to name only languages in the first 20 ranks, weighting together close to 28% of contents, the situation has changed radically and English represents today only a quarter of the contents, which is still the double of its connected population and a spectacular performance.
English remains the first language of the Web in terms of power but the proportion are changing drastically. Chinese is now the language with more connected speakers. In terms of power, Spanish has a solid third place, followed by French and Hindi, and a group of 5 languages share close position after that: Portuguese, Russian, Arabic, German and Japanese.
As for the indicators independent of the number of speakers (capacity and gradient) the languages of countries rated high in Information Society parameters are leading: Hebrew, Finnish, Swedish, Dutch, German and Danish.
The most connected speakers are Danish, Swedish, Japanese, Dutch, Swiss German and Finnish.
This study has been made possible thanks to the support of the Cultural and Educational Department of the Brazilian Ministry of Foreign Affairs, within the frame of the International Institute of the Portuguese Language and the coordination of the UNESCO Chair on language policies for multilingualism, under the lead of Gilvan Müller de Oliveira. Thanks to Álvaro Blanco for its programming support and also to David Pimienta. The idea of collecting diverse sources for measuring languages in the Internet as well as to transform figures per country into figures per language was first given by Daniel Prado in 2012.
The Observatory, with the support of Organisation Internationale de la Francophonie, will offer new results before the end of 2021, with new approaches for trying to remove the remaining biases, and a coverage extension to the 328 languages with more than 1 million L1 speakers (as opposed to the actual coverage of 138 languages of L1 > 5M).
This information is open and public, feel free to circulate wherever you feel appropriate and do not hesitate to open dialog with email@example.com about questions, doubts or documented critics.
L1 stands for mother tongue (also referred to as first language)
L2 stands for second language.
The ratio (L1+L2)/L1 shows the worldwide importance of multilingualism (in other words, 43% of the world population speaks more than one language.