MAIN PROJECT – METHODS

Compilation /evaluation of all identified methods

COMPARISON BETWEEN DIFFERENT APPROACHES TO MEASURE PROPORTION OF LANGUAGES ON LINE AND HISTORICAL FLASHBACK

We will try to maintain this study up to date. If you manage an approach not listed here or if you know about another approach, please send us information via the contact page.

So far, five different approaches have been identified of companies, universities or civil society organizations offering, as of today, figures about language’s proportion online. This document exposes them and try to trigger conclusions from their similarities and differences and possible biases.

Methods and figures of the initial period, from 1997 until 2007, are presented superficially at the end of this document. Details have been exposed and analyzed in UNESCO’s Twelve years of measuring linguistic diversity in the Internet: balance and perspectives, D. Pimienta, D. Prado, A. Blanco- 2009 – .

At the end of the first period, by 2007, evidences existed that the proportion of English in the Web was already around 50%. Today, all evidences converge toward a figure within the 20%-30% window, in spite of popular but biased figures telling “over 50%”…

I – CURRENT APPROACHES FOR MEASURING LANGUAGES ONLINE

APPROACH 1: W3TECHS

Source: https://w3techs.com/technologies/overview/content_language

Method: Daily application of a language detection algorithm on the one million more visited websites, as listed by TRANCO.

Type: Internet service company specialized in web technology surveys

Span: Daily since 2011

Exposed methodology: partial (https://w3techs.com/technologies)

Peer reviewed methodology: no

Bias discussion: no

Confidence interval of the figure: Not available

Number of languages covered: 40

Our diagnose: We suspect that they assign a unique language per website, by default English, if it is a multilingual site including English as an option. This triggers a strong bias favoring English. They are other biases to understand (like extrapolating the one million more visited to the whole web favor English and European languages) but the main bias of not taking into account multilingualism of websites may conduce to an overrating of English in the order of 100% (see demonstration in https:/obdilci.org/projects/main/englishweb/ ).

Conclusion: The date produced daily by W3Techs is the percentage of English, as one of the linguistic options of a website, and the percentage of 40 other languages, when English is not one of their linguistic options, in the Website’s series belonging to the one million most visited. Due to its long history, to the fact the company is considered quite reliable for its surveys on web technologies, it has become, in spite of its huge bias, the main reference on the subject of languages online, for a long time, by many, including policy makers and researchers, which is a real issue of misinformation.

APPROACH 2: DATAPROVIDER.COM

Source: https://www.dataprovider.com/blog/domains/what-languages-does-the-web-speak/
Method: Unique application of a language detection algorithm on allegedly close to all the existing websites (99M over 136M valid websites of which data is stored in their data base).
Type: Internet service company specialized in data analysis
Span: Once in January 2023
Exposed methodology: No. However, kindly answered all our questions which allows to share the following description. They explore, using https://github.com/jmhodges/gocld3 for detection (a model identifying a little more than 100 languages), the whole Website’s universe (in 2023, 710 Million websites, of which 136M where found valid). Note that their figures are extremely coherent with Netcraft statistics (https://www.netcraft.com/blog/october-2024-web-server-survey/). In 2023, they applied language detection on a subset of 99M, which is 73% of the total, filtered by country (at that stage only 62 countries were included, which is less than 30%). They do keep the information of the various linguistic versions, when specified in the hreflang= HTML instruction, however it was not yet put in use in the 2023 published statistics, which then account for only one, the main language of the websites.
Peer reviewed methodology: no
Bias discussion: Besides the same multilingualism bias which applies to W3Techs, there is a bias resulting from the countries excluded from the selection.
Confidence interval of the figure: Not available
Number of languages covered: 107
Our diagnose: This is a very interesting and promising approach since this company holds a data base with the whole website’s universe (today, 163 M valid from 856 M total) and the potential to apply, partially, the measurement allowing to take into consideration the multilingualism of websites (partially, because following our studies, the hreflang parameter is used by only 40% of multilingual websites). At this stage, the measurement results have to be taken with the same caution as the one of W3Techs. If the country selection bias is not considered, the data could confirm that the global web percentage of websites having English as one of the versions is comparable, although slightly lower, to the one of the Tranco sample analyzed by W3Techs, which makes sense. Indeed, it is probable that many European languages, English included, have higher probability to be within the most visited websites. This approach is to be followed with interest as it has the potential for improvement towards bias controlled results and thanks to the transparency of the company that we praise and appreciate.
Conclusion: In the hypothesis the company invest in a new campaign, this time including the information of a part of the multilingual sites, it is possible to mitigate the 2 remaining biases.
1) the selection bias: given the list of countries which are excluded, the extrapolation of missing data is possible from the combination of Internet connection rate by country and speakers by language in each country resulting in the percentage of connected speakers, for each language, excluded from the data. OBDILCI can provide, from those percentages, a multiplicative correction to be applied to each language counter and therefore mitigate the bias. This will not obviously remove totally the bias for all languages but the results will have reduced the bias in large proportion (note that this method is largely used in OBDILCI model to complete partial statistics).
2) The multilingual residual bias: from our approximate stats, only 40% multilingual websites use the hreflang instruction to specify the list of linguistic options. Not taking into account of 60% of the cases is a large bias. It can however be reduced drastically by simply multiplying all the language’s counters by 100/60 = 5/3, under the assumption that the pattern obtained with 40% will roughly replicate for the rest. Obviously, the assumption may be wrong but the result will be much less biased with that correction. Doing that, DATAPROVIDER.COM could produce the best ever approximation of measurement of the proportion of languages in web contents and, with the same logic, could produce the first ever serious approximation of a key data, the rate of multilingualism of the Web, to be compared to the same value for Humans (the definition of this indicator is: the total linguistic versions identified divided by the total website analyzed). OBDILCI and DATAPROVIDER.COM have planned discussions beginning of 2025 to examine possibilities of cooperation towards unbiased figures.

APPROACH 3: NETSWEEPER

Source: https://www.netsweeper.com/government/top-languages-commonly-used-internet
Method: They claim applying language detection algorithm on 12 billion webpages.
Type: Internet service company specialized in web filtering for security purpose
Span: Once in June 2023
Exposed methodology: No. No answer various attempts to communicate.
Peer reviewed methodology: no.
Bias discussion: Not provided. Would the claim be confirmed that they work on webpages instead of websites, then the multilingual bias would be overcome. A unique bias would remain to be analyzed in the NETSWEEPER method, the selection bias. Twelve billion webpages could represent 30% of the whole webpage’s universe, which is a high figure; yet depending on the way that selection is made, the resulting bias could vary from almost none to large! If the pages are selected randomly then the bias is almost null. If the selection is made on all websites but restricting the selection to a subset of the pages of each website, again the bias could vary between null or the same multilingual bias, if the selection favors the pages belonging to the English version. The fact that they compute English at around 25% is possibly a signal that this bias is controlled. However, in the absence of information on the process this remains undecidable for the moment.
Number of languages covered: 47
Our diagnose: If what is claimed is the reality this is a method free of the multilingual bias applied to a substantial portion of the Web (nobody knows really the number of webpages, figures of around 40 billion are given by https://www.worldwidewebsize.com) , and this would represent 30% of the universe. If the selection bias is almost null or mitigated by some technic, this could become the most promising result on the subject. The coincidence with many OBDILCI’s figures is striking and argue for a controlled selection bias however, without further information on the method, this stay remains an hypothesis.
Conclusion: It is a pain they never answered our several requests for methodological information to what remains a serious candidate for best method for language proportion on web contents. We have used twice the contact form on their website and directly mailed the CTIO in March 2024 and obtained no response. Hopefully, in the future we will get that information and be in capacity to conclude the diagnosis.
Netsweeper staff, if by any chance you read this webpage please contact us.

APPROACH 4: IONAN UNIVERSITY GREECE

Source: https://doi.org/10.3390/fi12040076
Method: They focus the ccTLD of European Union for English content. They use a language detection algorithm on 100 000 websites. They avoid the multilingual bias by crawling all internal links.
Type: Visual arts Academic Department
Span: Once in June 2019
Exposed methodology: Yes fully transparent
Bias discussion: No
Confidence interval of the figure: Not available
Number of languages covered: 1, English
Our diagnose: This is a totally reliable experiment but restricted to European ccTLD. It can serve anyway to a possible indicator of the range of English proportion globally.
Conclusion: This the first welcomed incursion from a long time of the academic world in that matter. The study has all the attribute of robustness of academic per-reviewed work. However, it targets a defined subset of the Web and the results cannot be generalized for the whole Web. Anyway, this is another argument to claim that the stable W3Techs position of English above 50% since 2011 is just absurd. Is there any reason the average percentage of English in the websites of ccTLD of European Union before Brexit (including English speaking countries: UK, Ireland and Malta) be so much lower than in the whole Web universe? We see no reason, at the contrary.

APPROACH 5: OBDILCI MAIN METHOD

Source: https://www.obdilci.org/projects/main/
Method: This is an indirect method based on the collection and organization of multiple indicators. It cannot really be considered as a measurement, it is rather a realistic approximation based on some solid assumptions and sources, a subset of them implying biases which are thoroughly discussed.
Type: Civil Society Organization, working in that subject since 1998
Span: Since 2017, once or two per year
Exposed methodology: Yes. Totally transparent in peer reviewed article https://doi.org/10.3389/frma.2023.1149347
Peer reviewed methodology: Yes see the previous URL
Bias discussion: Yes, very detailed and comprehensive, see the previous URL.
Confidence interval of the figure: large, +-20% (estimated, not computed)
Number of languages covered: 361
Our diagnose: This indirect approach stands in solid data for the number of L1 and L2 speakers of each language per country (Ethnologue), the percentage of persons connected per country (ITU), the assumption that there is a natural economic law which links demand (speakers of a language connected) and offer (contents for that language), which modularity depends on a large set of factors which have been represented by the largest possible set of trustable indicators (traffic, subscriptions, presence of languages in interfaces and tools, information society readiness…). There is a simplification assumption (all connected language speakers of the same country share the same percentage of connectivity) which is the main bias and the reason why the model is limited to large population of speakers (L1 > 1M). It is not a measurement but it is a sound plausibility for figures within a large confidence interval and until other method is validated as bias controlled, it remains a serious approximation, covering much more languages than the other methods.
Conclusion: Interested and critical minds could genuinely ask: how come such a method would approximate the reality just by averaging many indicators? Knowing that it is based on a very theoretical hypothesis (the existence of an unknown law linking internauts per languages and web contents per language), could this unknown law be indirectly described in order to allow approximative, yet reliable figures, by collecting multiple indicators and processing them statistically using mainly weighting operations?
We would like to give some intuitive response to that reasonable questioning. One of the most impressive mathematical math class received in pre-PhD invited the students to create the equation of a wave reaching shore in a beach. The physics are very complex but the professor claimed that students did not need to know anything about the physics to have an approximate but relevant equation! How come? The size of the height of the wave reaching shore is the result of swell reaching a progressively reduced depth, just list all the parameters which are involved: period of the swell, height of the swell, curve of the depth of shore… and just combine them in order to be coherent with their dimension (distance in meter, height in meter, speed in meter per second, period in second, etc). Create the most simple equation where the resulting dimension is compatible with the result, the height of the wave is a data in meter so have to be the mathematical combination of factors. The equation you obtain has all probability to be a first representation of the reality. And it does work! More information on that technic in https://en.wikipedia.org/wiki/Dimensional_analysis.
Here we are in a different context, it’s not complex physics but big data and statistics. In an ideal world, all languages are equal and the law is linear: in terms of world percentages, there is as much contents than speakers in each language. The ratio we call productivity of contents (percentage of contents divided by percentage of connected speakers) is equal to one for each language, it’s a linear equation. We do obtain that linear data by weighting the speaker’s matrix (languages vs countries) with the connectivity vector (percentage of connected per country). Then the reality is that many factors modulate this ratio above or below 1, depending on what language: speakers of languages and country where they access contents: tariff, bandwidth, digital education, e.government applications, business environment, technological capacities of the language, presence in main applications, and so on. If you can get indicators of all those parameters there is a good chance, if only big data is processed (languages with large numbers of speakers) that your “created statistical equation” will be a reasonable approximation. Note that a large proportion of factors depend on countries rather than languages, but the existence of the languages per country matrix allow to play the game, providing some simplifications, which certainly bring biases, but such biases may become marginal with big numbers.
Obviously, if the OBDILCI model results could be confirmed by some real data measurement, providing they are bias controlled, this would enhance the trust…

APPROACH 6: OBDILCI MECILDI PRE-STUDIES

Source: https://obdilci.org

Method: This is a manual effort applied to a series of ten times 100 sites taken randomly from the TRANCO list. We have checked manually all the languages of each website, and how the linguistic options are implemented, both in the interface and in the HTML source, in order to study the strategy and tactics to be able to take into account in a future an exploring approach based on language detection (see approach 6). We took the opportunity to approximate a key indicator totally unknown at this stage: the rate of multilingualism of the Web, defined by the total number of linguistic versions divided by the total number of websites (the same rate for the whole humanity is measured at 1.443, following Ethnologue source and we expect the Web has a larger figure). This figure is the key to evaluate the size of the bias of not considering web multilingualism: for instance, if its value is 2 then the bias is 100% overrating of English proportion. First approximations in hand exploration of 1000 random websites from Tranco list is around 2 (with high variance, so this is to take with caution).

Type: Civil Society Organization

Span: Twice in 2022 and 2024

Exposed methodology: Yes. Totally transparent.

Peer reviewed methodology: Yes in https://doi.org/10.30564/fls.v6i5.7144

Bias discussion: Yes

Confidence interval of the figure: no

Number of languages covered: English only

Our diagnose: This is just an intermediary input created by human exploration of a limited subset of the web as a trend indication noted with mean and covariance. This is part of the MECILDI project.

Source: https://obdilci.org

APPROACH 7 : MECILDI@OBDILCII

Method: OBDILCI plans to create a new tool in 2025, a software to allow the language detection in a series of websites, with due and systematic considering of the fact that websites can be multilingual. This tool will serve various project and will be tested first using the Tranco list. Pre-studies have started to determine strategies and tactics to tend to completely reflect on the languages of multilingual websites. This is a complex problem due to the variety of implemented solutions in websites, many not reflected directly in the visible source code. The pre-studies have allowed to determine some approximate stats and data which will be useful for bias mitigation: percentage of websites using lang= instructions, percentage of websites using hreflang= instructions, percentage of websites using GoogleTranslate imbedded, percentages of disposition of the language options in the interface (up, side, bottom, indirectly per country option, in a configuration page), coding patterns used for multilingualism… The complexity entails a combination of technics and approaches, including probably some IA. Our computing power being limited, we decided for a statistical approach: instead of analyzing all websites, we will create 100 random sampling of 1000 websites and manage the statistical distribution of results to obtain average, variance and confidence interval for each language and the rest of parameters.

Type: Civil Society Organization

Span: Future (2025)

Exposed methodology: Will be

Peer reviewed methodology: Will be.

Bias discussion: Will be

Confidence interval of the figure: Will be computed by statistical method

Number of languages covered: 141, the languages we are present in both Obdilci model and GoogleTranslate. In other word, the subset of languages from 250 GoogleTranslate which have more than one million L1 speakers. Why so? Because for languages with low number of speakers the statistical approach chosen would not provide serious results.

Our diagnose: This is a project to be realized in 2025 opening avenues to new researches.

II – COMPARISONS OF RESULTS FOR ENGLISH

 W3TECHS 1/2023DATA PROVIDER 1/23NET SWEEPER 6/23IONAN Univ. 2020OBDILCI Main 5/2023MECILDI Pre-study 5/2024
English57.7%51%26.3%28.4%20%29%

COMPARISON OF THE FIRST LANGUAGES

 W3TECHS 11/2024DATA
PROVIDER 1/23
NETSWEEPER 6/23OBDILCI 5/2024
1English 49.4%English 51.3%English 26.3%English 20.4%
2Spanish 6%Chinese 10.”%Chinese 19.8%Chinese 18.9%
3German 5.6%German 7.3%Spanish 8.1%Spanish 7.7%
4Japanese 5%Spanish 3.9%Arabic 5%Hindi 3.8%
5French 4.4%Japanese 3.7%Portuguese 4%Russian 3.7%
6Russian 4%French 3.4%Malay 3.4%Arabic 3.7%
7Portuguese 3.8%Russian 2.8%French 3.3%French 3.4%
8Italian 2.7%Portuguese 2.7%Japanese 3%Portuguese 3.1%
9Dutch 2.1%Dutch 2.0%Russian 2.8%Japanese 2.2%
10Polish 1.8% Italian 1.9%German 2.1%German 2.2%
11Turkish KoreanMalay
12Persian TurkishBengali
13Chinese ItalianTurkish
14Vietnamese RomanianItalian
15Malay PersianVietnamese

WHAT DO THOSE COMPARISONS TELL US?
  1. Caution is highly recommended when reading figures on the percentage of languages in the Web, especially when it refers to English, since there is no agreement between the different results.
  2. Two different version of English percentage seems to emerge: one around 50% and another around 25%. Would that be the problem of overestimation of 100% due of the non-consideration of the multilingual property of many websites and explained in mentioned link? Yes! The fact that DATAPROVIDER.COM exposed figures in 2023 which did not use yet the multilingual data they collected validates that hypothesis.
  3. All those results converge with a high probability that the percentage of English in the whole Web, paying due attention to the multilingualism of websites, is around 25%. Check https://www.obdilci.org/projects/main/englishweb/ if you want to understand why.
  4. Why is Chinese so low with W3Techs? The results claimed by W3Techs for Chinese, less than 2%, knowing it is the first language of the Internet in terms of users, is absolutely not credible, as we already have informed in various publications. Where is the real value between 10% and 20%? As Chinese is probably used in many bilingual sites (Chinese, English) the same rule may apply and the DATAPROVIDER.COM figures may need to be multiplied by 2 and we would have a consensus on 20%. During MECILDI pre-studies we have discovered that a high proportion of Chinese websites (50% in our sampling!) set the parameter lang= to English instead of Chinese. Could that be the explanation of the error in W3Techs? Using that parameter, when it is specified, instead of applying the language detection, seems at first approach, a valid decision to spare CPU resources.
  5. So far, Netsweeper could be considered as the most reliable result as its method targeting webpages instead of websites prevents the multilingualism bias and they claim to cover 12 billion webpages, a figure which could represent 30% of the webpages universe, following  the estimate of  https://www.worldwidewebsize.com. Unfortunately, they have not answered our many requests for information. The assumption that they explore a large proportion of the webpage’s universe is plausible and would require confirmation, anyway without further information, the selection bias issue remains undecidable It is notable however how close those results are from OBDILCI’s approximation. The main differences concern the languages of India (Hindi, Bengali, Urdu) which presence in contents could be overestimated by Obdilci or underestimated by Netsweeper. This point deserves attention given that importance of India in demographic terms. Based on a solid study made in 2017 by KPMG, whose conclusion is that Indian internauts tended to use more and more their local languages to navigate, we maintain our figures but need to investigate why then the number of webpages is so underestimated by the others approaches.
  6. It is interesting to compare the predictions from OBDILCI with the measurements from DATAPROVIDER.COM for languages with low level of contents. We have noticed some extreme coincidences (Galician and Basque) as well as some extremely remote figures (Afrikaans, Haitian creole, Irish and Indian languages). The country selection bias could be an explanation to be investigated.

III FIRST PERIOD INITIATIVES (1996-2011)

It could be interesting to have a highlight on the approaches which have been developed in the previous period of the Web, from 1998 to 2011. For more, read the following article which covers in more details that period: “Twelve years of measuring linguistic diversity on the Internet: balance and perspectives.”; D. Pimienta, D. Prado, A. Blanco, UNESCO CL/2009/WS1 –

We will just mention superficially each project; by chronological order.

Xerox Study (1996-2000)

Method: Linguistic approach based on occurrence of frequent words in the corpus.

Source: Grefenstette,G.; Noche, J. Estimation of English and non-English Language used on the WWW. Technical Report from Xerox Research Center Europe, 2000. https://arxiv.org/abs/cs/0006032

Span: One shot method not replicated. Was the first historical attempt.

Discussion: Offer few languages percentages compared to English.

OBDILCI/Funredes (1998-2007)

Method: Use the capacity, reliable at this time, of Search Engines to report the number of occurrences of a chain of character in the whole set of webpages indexed. Use a comparative vocabulary selected with extreme care of syntactic and semantic correspondence and bias analysis for a selected set of languages: English, French, Spanish, Italian, Portuguese, Catalan, Romanian and German. Use statistics technics to derive results in terms of percentage of each language compare to English.  Percentage of English is then approximate by various technics.

Source: Historical site of the Observatory https://funredes.org/lc/english/inicio/

Span: Several measurements were organized in the period 1998-2007 allowing to show a decline of English from 80% to 50% and the general growth of non-English European languages. Was the second historical attempt and the only one maintaining observations for a large period, together with LOP.

Discussion: The evolution of search engines, making after 2007 those figures becoming totally unreliable, signed the end of that method (and many other projects around the world making use of that exceptional capacity of counting words or expressions in the Web). OBDILCI/Funredes pursued its mission until 2017, when Funredes stopped its activities, with contributions on the field mainly towards, French and Spanish, and the search for a new approach, which emerged in 2012, from Daniel Prado’s ideas of measuring thru a large collection of indicators and transforming country indicators into language indicators by crosschecking with demo-linguistic data. This new method matured in 2017 and get bias controlled in 2022.

ISOC Quebec/Alis Technologies, followed by OCLC (1997, 1999, 2002)

Method: A series of website is obtained by random generation of 8000 IP numbers. An algorithm of language detection is applied on this series and percentages are computed. This method is not statistically valid as the statistical requirement to obtain reliable results is to avoid a one-time shot and maunch several runs, say 100 times the same operation and apply statistic laws on the obtained distribution (average, variance, confidence interval). This method was replicated identically twice in 1999 and 2002 with the same flaw. The three measurements provided the same score of 80%, stable during 5 years which, with good marketing, fed the misinformation during the period and until UNESCO publications made the media switched to the 50% value.

Span: Three unique shots in 1997, 1999, 2002.

Sources:

https://web.archive.org/web/20010810234537/http://alis.isoc.org/palmares.en.html https://www.researchgate.net/publication/271903988_How_World_Wide_Is_the_Web https://www.dlib.org/dlib/april03/lavoie/04lavoie.html

INKTOMI (2000)

A search engine, INKTOMI, announced with huge marketing force, his measurements of languages in the Web in 2000. They presented the 10 first languages with English heading at 86%. A huge detail that few observers seemed to notice: the total of percentage was 100% besides the fact that many other languages were left out of the figure! This was lacking the most elementary mathematical seriousness…

Google: Method of the complement of an empty space (1988-2008)

This is how we named a facility discovered by accident in March 1998, with AltaVista, and that Google replicated, which served at knowing the size, by language, of the index of the Search Engine, in those times.  By making a request to the search engine of the type  â€ś -ggfdgfdyugfgvdgdv” where the first term is empty and the second a chain of character which appear in no webpages, the figure of the number occurrences resulting was the total number of webpages. If first a language was set, then the response was the number of pages in that language. The figure given by Google with that method was on the same order as our method at that time, close to 51% for English in 2008, and Chinese was already around 9%, a figure that W3Techs set at less than 2% today. Several publications were made then pretending having computed the webpages per language and simply copying the results of that simple method without giving the source.

Language Observatory Project – LOP (2003-2011)

Method: Application of language detection on portion of the Web, typically ccTLD of countries where local languages were the target. This project, a consortium of universities leaded by Nagaoka University, holds all the hope to finally see this important subject located where it deserves, in the research community within a concept of consortium. The common membership of Funredes/Obdilci and LOP in the MAAYA network (World Network of Linguistic Diversity) was furthermore a promise of fruitful cooperation. This cooperation strengthened in late 2010 when Funredes was given by LOP the data for exploration of Latin America ccTLD and close interaction to assess the material, however the catastrophic Tsunami  occurring in 2011 in Japan provoked, among other drama, the brutal end of this promising project.

Sources: https://dl.acm.org/doi/10.1145/1062745.1062833 https://en.wikipedia.org/wiki/Language_observatory

UPC/IDESCAT (2003-2006)

Universitat Politecnica de Catalunya with Statistical Institut of Catalunya organized a data base of 2 million websites to check the presence of Catalan with language detection and presented results quite close to Funredes/Odilci in 2005 and not so close in 2006.

Source: https://raco.cat/index.php/LlenguaUs/article/view/128275/177480

IV A plausible curve of the evolution of English contents online

To conclude, we present a quite plausible curve of the evolution of English proportion online.
Extracted from French publication “Une histoire très brève de l’observation des langues dans l’Internet” dans Culture et Recherche, N° 143, AUTOMNE-HIVER 2022, La recherche culturelle à l’international, page 128-131. 
Image Source : https://www.obdilci.org/wp-content/uploads/2024/04/EnglishWeb.jpg

Projects by OBDILCI

  • Indicators for the Presence of Language in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • AI and Multilingualism
  • Digital Languages Death
  • DILINET
  • Pre-historic Projects…