The Observatory of the Linguistic and Cultural Diversity in the Internet is glad to announce various important milestones in its endeavor towards measuring linguistic and cultural diversity in the Internet.
1. New Domain Name
We have now a domain name and a logo designed to express our mission.
2. Methodology Published in Peer-Reviewed Journal
The detailed description of the methodology which sustains our model of creation of indicators of the presence in the Internet for 342 languages, has been published in a peer-reviewed, open data and well famed Journal, Frontiers Research Metrics and Analytics: The method behind the unprecedented production of indicators of the presence of languages in the Internet. The reference is Front. Res. Metr. Anal. Sec. Research Methods Volume 8 – 2023. doi: 10.3389/frma.2023.1149347.
This is a key milestone for this project.
3. Version 4 of Study Released
The model, which produces innovative indicators since 2017, have now reached version 4, with update of its demo-linguistic figures, from Ethnologue dataset #26 of March 2023, and with an additional improvement in our quest for bias reduction (check here).
The starting progress detected for African languages, in version 3.2, with update of the ITU figures for people connected per country, are confirmed in Version 4 and may mark the start of the inclusion of African languages in the new era of multilingualism in which the Internet is imbedded (see the cyber-geography of language families evolution across the versions here).
4. Database Access Now Available
A data base access to the results produced by the model as passed tests and is now online at https://obdilci.org/Base.
5. Preprint of New Article
We would also like to share the preprint of a new article which will also be peer reviewed soon: “Is it true that more than half the web contents are in English? If web multilingualism is paid due attention then no!”, doi: 10.13140/RG.2.2.20798.70724. Hopefully, this simple but effective demonstration will contribute to oppose biased figures denying the extraordinary growth of multilingualism occurring in the Web in the last decade.
It is time to thanks again the organizations which have made possible this project and the subsequent progresses.
Acknowledgements
The project has been conducted by OBDILCI, with the collaboration, starting from version 2, in April 2021, of the UNESCO Chair on Language Policies for Multilingualism, of which OBDILCI is member. The project has been funded by Organisation de la Francophonie, for pre-studies, version 1 and version 3 and by Brazilian Ministry of Foreign Affairs via Instituto da Lingua Portuguesa for version 2. The realization of the data base access and the openness of the publication in Frontiers Research Metrics and Analytics, has been funded by Délégation générale à la langue française et aux langues de France, from France Ministry of Culture, together with the Permanent Delegation of Brazil to UNESCO.
Sources
The indicators of the presence of languages in the Internet are produced by a model developed by OBDILCI which makes use of the following sources:
For demo-linguistic data, Ethnologue, a proprietary source, updated once a year.
For percentage of persons connected to the Internet, ITU and World Bank, both public sources updated once a year.
For figures about languages or countries on the Internet, a large variety of public direct sources (such as Imminent T-Index or legacy.socialprogress.org) or some indirect sources (such as similarweb.com applied to a series of websites) which are all listed in https://www.frontiersin.org/articles/10.3389/frma.2023.1149347/full#supplementary-material.
Products
The indicators produced by OBDILCI are accessible under CC-BY-SA 4.0 license, in Excel files, downloadable here or in form of data base query in https://obdilci.org/Base. The results from Version 3.0 are fully described in the peer-reviewed, open data article Resource: Indicators on the Presence of Languages in Internet, presented in 2022 in the bi-yearly event organized by the European Language Resources Association, SIGUL2022/LREC2022. The following indicators are produced for each one of the 341 languages with more than one million L1 speakers and sorted:
a: Percentage of world L1+L2 population
b: Percentage of world L1+L2 connected population
c: Percentage of L1+L2 speakers connected to the Internet
d: Percentage of web contents
e: Virtual presence (d divided by a)
f: Content productivity (d divided by b)
In version 4, the following world values have been reached:
L1+L2 population: 10 598 681 424
L1 population: 7 403 726 853
World Multilingualism rate: 1,43
Covering of world L1+L2 population by the 341 languages of the study: 96.04%
Covering of contents by the 341 language of the study: 98.00%
L1+L2 world percentage of connected persons: 66.66%
For More Information:
Versions in French, Portuguese and Spanish of cited papers are available in corresponding linguistic versions of the web.