MAIN PROJECT – V2.0 (2021)

Indicators for the Presence of Language on the Internet

NOTE: This is an archived version of the study. Click here to view the most up-to-date version

Enhanced and second version of an alternative approach to produce indicators of languages in the Internet

Project Summary – V2.0 (2021)

As shown in 2017, the commendable effort of W3Techs to offer daily updated figures for contents is biased at many different levels (the strongest but not unique being the lack of consideration of multilingualism and the fact that most multilingual websites including English are probably computed as English only). This source projects values for English contents in the Web which are extremely exaggerated (above 50% whereas the reality is probably today below 25%).

The lack of sources fuels the myth in the media that more than half of websites are in English. This was the case between 2007 and 2009, but since the exponential growth of Chinese, Hindi, Arabic, Turkish, Bengali, Vietnamese, Urdu, Persian and Marathi, to name new languages in the first 20 ranks and together weighting close to 28% of contents, has radically changed the situation and English represents today only a quarter of the contents. Between 2000 and 2007, the myth was that English occupied 80% of the Web and it finally disappeared after 2009 UNESCO’s publication and a presence of English in the Web around 50% was the accepted figure.

How come English would have kept stable at 50% during the 14 last years while the Internet has been changing radically its demography and the number of connected English speakers (L1+L2) has decreased from 32% of the total of connected persons in 2007 to only 13% today?

English remains the first language of the Web in terms of power but the proportion are changing drastically. Chinese is now the language with more connected speakers. In terms of power, Spanish has a solid third place, followed by French and Hindi, and a group of 5 languages share close position after that: Portuguese, Russian, Arabic, German and Japanese.

As for the indicators independant of the number of speakers (capacity and gradient) the languages of countries rated high in Information Society parameters are leading: Hebrew, Finnish, Swedish, Dutch, German and Danish.

The most connected speakers are Danish, Swedish, Japanese, Dutch, Swiss German and Finnish.

All the results for the 132 languages of the study and full description of methodology can be read in the document below in English, French, Portuguese or Spanish.


Results of the 2021 Study (V2.0)

More Information

Percentage of English Pages in the Web

Cyber-Geography of Languages

Warning: Stats are only meant for 133 languages with L1>5 million

%L1 + L2%CON.W%CON.POWERCAPACITYGRADIENT
African Languages7.03%31.11%4.00%2.00%0.2840.519
American Languages0.21%53.80%0.21%0.13%0.5950.623
Asian Languages45.86%50.85%42.63%34.39%0.7500.783
Arabic Languages3.53%60.14%3.89%3.09%0.8750.796
European Languages30.26%69.64%38.53%53.90%1.7811.415
Rest13.10%44.84%10.74%6.50%
  • W.Conn.: percentage of speakers of that language connected to the Internet related to total speakers connected to the Internet
  • W. Pop.: percentage of speakers of that language related to the total world L1+L2 population
  • L. Conn.: percentage of L1+L2 speakers of that language who are connected to the Internet
  • REST: represents the results for the full set of all languages ‚Äč‚Äčof the world except the 15 languages ‚Äč‚Äčlisted in the table
  • ALL PERCENTAGES ARE OVER L1+L2  POPULATIONS

Summary of Results

  • The European languages began the dominance of the Internet and are its historical languages
  • Center of gravity is moving fast to Asian/Arabic languages
  • African languages are behind but associated with major demographic growth

Credits

This version was made possible thanks to the support of Departamento de Cultura e Educa√ß√£o do Minist√©rio das Rela√ß√Ķes Exteriores do Brasil within the framework of Instituto Internacional de L√≠ngua Portuguesa and under the coordination of C√°tedra UNESCO em Pol√≠ticas Lingu√≠sticas para Multilinguismo. Credit also goes to Daniel Prado, who first came up with the idea of collecting multiple sources to measure the presence of languages on the Internet, as well as transforming country data into language data.

Thank you: To Professor Gilvan M√ľller de Oliveira for his support with linguistic issues and coordination with funders; to √Ālvaro Blanco for writing delicate Excel macros that radically changed the handling of so many fonts and spellings of languages and countries, and to David Pimienta, who wrote the Excel macros necessary to transform the Ethnologue format into the format required for this study, as well as for the macrolanguage treatment.

Disclaimer: This study is essentially a statistical work based on a wide variety of sources. The adoption of an important source in this type of work also logically implies the adoption of the rules that support the data from that source. The author is not responsible for the list of countries and territories considered, established by the ITU, a United Nations body, nor for the list of languages with more than five million L1 speakers, according to Ethnologue, as well as the grouping into macrolanguages, adopted by Ethnologue , in accordance with ISO 693.3.

Projects by OBDILCI

  • Indicators for the Presence of Language in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • AI and Multilingualism
  • DILINET
  • Pre-historic Projects…