MAIN PROJECT – V3.0 (March 2022)
Indicators of the Presence of Languages in the Internet
NOTE: This is an archived version of the study. Click here to view the most up-to-date version
Introduction – V3.0 (March 2022)
Version 3 : 3/2022, with comprehensive bias reduction and redefinition of some outputs
More than a new version, this is the reach of maturity for the method as all the biases are now controlled to an acceptable threshold and the produced indicators are reliable within a ±20% confidence interval.
The Observatory is pleased to share the results of version 3 of its model for computing indicators of the presence of languages on the Internet, which, as for version 2, announced in 2021, processes the 329 languages over one million native speakers.
A confidence interval of -20% +20%, may seem wide if we apply the criteria of other statistical works, but for the data about the place of languages on the Internet, a subject that has always been very difficult to reach, and prone to chronic misinformation, this is a feat.
All the results are available under CC-BY-SA 4.0 license
What do the results tell us? The winner is multilingualism.
The transition of the Internet between the domination of European languages, English in the lead, towards Asian languages and Arabic, Chinese in the lead, is well advanced and the winner is multilingualism, but African languages are slow to take their place.
Project Summary
Read a short peer-reviewed, open article presenting the results of V3 in terms of indicators and a synthesis of the method:
Resource: Indicators on the Presence of Languages in Internet, Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, a workshop of LREC2022
Methodological Note
This is an indirect approximation of the space of languages in the net using different data sources and statistics technics. All computations and results are made on the basis of L1+L2 where L1 is mother tongue and L2 second language(s)
Following our main demo-linguistic source (Ethnologue #24) the world population (L1) and L1+L2 speakers population are:
L1 = 7 231 699 136 L2 = 10 361 716 756 L1+L2/L1 = 1.4328
The confidence interval of all the produced figures is estimated to be within the window ±20%.
The detailed methodology has been published in a peer-reviewed open Journal : The method behind the unprecedented production of indicators of the presence of languages in the Internet. Frontiers Research Metrics & Analytics, Volume 8 – 2023. doi: 10.3389/frma.2023.1149347
Results of the March 2022 Study (V3.0)
All Indicators for the 30 languages with higher Content Percentage
RANK CONTENTS L1+L2 | ISO CODE | LANGUAGES | % INTERNAUTS L1+L2 | % WORLD POPULATION L1+L2 | % CONN. SPEAKERS | % CONTENTS L1+L2 | % VIRTUAL PRESENCE L1+L2 | % CONTENT PRODUCTIVITY L1+L2 |
---|---|---|---|---|---|---|---|---|
1 | zho | Chinese Macro | 18,46% | 14,72% | 71,38% | 21,60% | 1,47 | 1,17 |
2 | eng | English | 14,83% | 13,01% | 64,86% | 19,60% | 1,51 | 1,32 |
3 | spa | Spanish | 6,79% | 5,24% | 73,72% | 7,85% | 1,50 | 1,16 |
4 | hin | Hindi | 4,19% | 5,80% | 41,16% | 3,76% | 0,65 | 0,90 |
5 | rus | Russian | 3,51% | 2,49% | 80,32% | 3,76% | 1,51 | 1,07 |
6 | fra | French | 2,98% | 2,58% | 65,80% | 3,33% | 1,29 | 1,12 |
7 | por | Portuguese | 2,99% | 2,49% | 68,43% | 3,13% | 1,26 | 1,05 |
8 | ara | Arabic Macro | 3,97% | 3,53% | 63,99% | 3,09% | 0,87 | 0,78 |
9 | jpn | Japanese | 1,99% | 1,22% | 92,63% | 2,66% | 2,18 | 1,34 |
10 | deu | German, Standard | 2,04% | 1,30% | 89,17% | 2,37% | 1,82 | 1,16 |
11 | msa | Malay Macro | 2,36% | 2,36% | 56,93% | 1,96% | 0,83 | 0,83 |
12 | tur | Turkish | 1,17% | 0,85% | 78,05% | 1,14% | 1,35 | 0,98 |
13 | ita | Italian | 0,87% | 0,66% | 75,83% | 1,00% | 1,53 | 1,14 |
14 | kor | Korean | 0,90% | 0,79% | 65,16% | 0,98% | 1,24 | 1,09 |
15 | fas | Persian Macro | 1,08% | 0,81% | 75,91% | 0,88% | 1,09 | 0,82 |
16 | ben | Bengali | 1,11% | 2,58% | 24,55% | 0,88% | 0,34 | 0,79 |
17 | vie | Vietnamese | 0,92% | 0,74% | 70,96% | 0,85% | 1,15 | 0,92 |
18 | urd | Urdu | 0,95% | 2,22% | 24,38% | 0,66% | 0,30 | 0,70 |
19 | tha | Thai | 0,80% | 0,59% | 77,95% | 0,65% | 1,12 | 0,82 |
20 | pol | Polish | 0,60% | 0,39% | 87,09% | 0,63% | 1,59 | 1,04 |
21 | mar | Marathi | 0,69% | 0,96% | 41,06% | 0,58% | 0,60 | 0,83 |
22 | tel | Telugu | 0,68% | 0,92% | 41,69% | 0,56% | 0,60 | 0,82 |
23 | tam | Tamil | 0,61% | 0,82% | 42,15% | 0,51% | 0,62 | 0,83 |
24 | jav | Javanese | 0,62% | 0,66% | 53,76% | 0,44% | 0,66 | 0,70 |
25 | nld | Dutch | 0,38% | 0,24% | 91,14% | 0,41% | 1,73 | 1,08 |
26 | guj | Gujarati | 0,44% | 0,60% | 41,47% | 0,36% | 0,61 | 0,83 |
27 | ukr | Ukrainian | 0,40% | 0,32% | 71,02% | 0,35% | 1,09 | 0,88 |
28 | kan | Kannada | 0,41% | 0,57% | 41,11% | 0,33% | 0,59 | 0,82 |
29 | ron | Romanian | 0,32% | 0,23% | 79,57% | 0,30% | 1,29 | 0,93 |
30 | aze | Azerbaijani Macro | 0,33% | 0,23% | 81,54% | 0,28% | 1,21 | 0,85 |
REMAIN | 22,60% | 30,10% | 15,13% | |||||
TOTAL | 100,00% | 100,00 % | 100,00 % |
LEGEND
Complete Results
Comparison of Results with Other Providers
Download Complete Results for All 329 Languages
Videos
Credits
Projects by OBDILCI
- Indicators for the Presence of Language in the Internet
- The Languages of France in the Internet
- French in the Internet
- Portuguese in the Internet
- Spanish in the Internet
- AI and Multilingualism
- Digital Languages Death
- DILINET
- Pre-historic Projects…