MAIN PROJECT – V1.0 (2017)

Indicators for the Presence of Language on the Internet

NOTE: This is an archived version of the study. Click here to view the most up-to-date version

Project Summary – V1.0 (2017)

This observatory has measured the space of latin languages, English and German in the Internet, between 1997 and 2007. After 10 years of eclipse, because of the evolution of Search Engines, we are back, thanks to the support of International Organization of la Francophonie and with MAAYA, with a new method to produce indicators for the 140 languages of more than 5 millons speakers.

The method, the results and a discussion about the biases of measuring languages in the Internet can be read in: An alternative approach to produce indicators of languages in the Internet, June 2017. The whole set of results for the 140 languages can be consulted  right below.

A short introductory 5 pages version has been prepared for the LT4ALL International Conference Language Technologies for All : Enabling Linguistic Diversity and Multilingualism Worldwide, UNESCO., Paris : Indicators of Languages in the Internet, November 2019.

Check also the 2 presentations :


Six Indicators

  • Internet users (Internet connected persons) that relates to the speakers of each language who have access to the Internet. A single micro-indicator (offered by ITU) answers that need and will serve as a fundamental source for the remaining work.
  • Usage: Relates to subscriptions to applications or to means of connection to the Internet. Eleven micro-indicators are involved in the construction of this indicator.
  • Traffic: Indication of the traffic generated by users to applications. Three hundred and sixteen micro-indicators are used to construct this indicator.
  • Indexes: Relates to country rankings in various aspects of the information society. Five micro-indicators are currently used to construct this indicator.
  • Contents: Relates to contents on the Web for each language and which, for the moment, mainly gathers data from the Wikimedia galaxy. Thirteen micro indicators provide data for this indicator.
  • Interfaces and translation of languages: ​ refers to the presence of languages ​​in interfaces to applications or as translation language. Twenty three micro-indicators build this indicator.

Three Macro-Indicators

  • Power of languages in the Internet, which measures the global share of the language in the Internet, average of the six previous indicators;
  • Capacity of the language in the Internet as measured by the ratio between the power and the percentage of the world number of speakers of that language;
  • Gradient as measured by the ratio between the power and the percentage of speakers connected to the Internet.
  • Productivity of the language in the Internet in terms of content creation, which is measured by the ratio of percentage of content in that language and the percentage of Internet users in the same language.

Results of the 2017 Study (V1.0)

LANGUAGEINTERNAUTSCONTENTPRODUCTIVITY
English22.2%32.0%1.44
Chinese20.5%18.0%0.88
Spanish9.1%8.0%0.88
French5.6%6.5%1.17
German3.1%3.8%1.21
Russian5.0%3.5%0.71
Portuguese4.0%3.5%0.88
Japanese3.4%3.5%1.04
Arabic4.2%3.0%0.72
Hindi3.9%3.0%0.77
Malay2.6%2.5%0.96
Polish1.7%1.8%1.09
Korean1.4%1.4%1.01
Bengali1.5%1.3%0.86
Italian0.9%1.1%1.23
Urdu0.8%0.7%0.84
REMAINING35.3%31.4%0.89
TOTAL125.0%125.0%

Note that the totals are higher than 100% to pay due attention to multilingualism (25% would be the population of people having second languages).

Projects by OBDILCI

  • Indicators for the Presence of Language in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • AI and Multilingualism
  • DILINET
  • and more…