MECILDI

1ST MECILDI RUNS 4/2026

1st MECILDI runs 4/2026

April 2026: This is an historical moment for whoever is interested in languages@internet. For the first time, a language detection program is designed with due consideration to web multilingualism, correcting the strong bias of previous methods favoring English.

Here are  the first  results of MECILDI version 1, applied to the Tranco sampling of one million  web sites the most visited… and subsequently allowing to correct the most perennial disinformation about the proportion of English contents.

Guess what? All of our main documented predictions are confirmed:

  • 22% is the correct figure for percentage of webpages in English in Tranco list (not 50% as sourced here), a figure forecasted by our model.
  • The “unbias” equation of W3Techs figures proposed  by our article is confirmed . Unbiased figure = bias figure / Multilingualism rate (22% ~= 56%/3)
  • The analysis of all existing methods, presented at UNESCO/LT4ALL, forecasting a figure for English between 20% and 27% is confirmed.
  • The multilingualism figures are slightly higher than expected (see results).

Let’s hope this will stop disinformation on the subject about English lingua franca of the Internet.

One of the side conclusions of this first result is tempering the enthusiasm of owning such a powerful new tool. It appears clearly that using the one million most visited web sites figures to extrapolate towards the whole web is a nonsense. The selection of most visited web sites, in spite the genuine effort from Tranco authors to reduce the possible technical biases of the first sources (Majestic, Cisco Umbrella, Quantcast) seems to be rotten in the roots in terms of geographic thus linguistic biases. Those sources target predominantly web sites from occidental countries and that disqualify any extrapolation to the whole Web as those figures are highly biased against most non-European languages. Why Chinese represents less than 3% of contents in spite having the highest percentage of connected speakers (17.6% of internauts speaks Chinese more than the 15.5% who speak English)? Probably because the selected most visited web sites in Tranco are not those visited by Chinese’s but by Chinese speakers from diaspora… We will try to further analyse this point but at this stage we consider OBDILCI’s model much more trustable than the extrapolation of those results to the whole Web.

Notes :

  • Version 1 of MECILDI, funded by DGLFLF, only computes the most standard method  for multilingualism and extrapolates complete figures based on prevalence assumption.
  • Version 2, funded by OIF, will be completed by the end of 2026, and process all possible methods (this is a complex mater).
  • The results obtained for the one million most visited sites should not be extrapolated as such to the whole web; this approach favors strongly European languages over the rest of languages. As for English, it will be less prevalent in the whole web compared to the most visited sites but, at the same time, the rate of multilingualism will be quite lower. Both effects contradicting each other, it is impossible to offer a figure at this stage and we stick to previous conclusions: the percentage of webpages in English in the whole web is inside the window 20%-27%.

What’s next?

  • We will document in all details the method as well as the results and considerations in a peer-reviewed paper for a well famed Journal.
  • We will apply MECILDI to a set of 7 gTLDs of languages of France and produce reports on the language repartition and multilingualism indicators for those domains.
  • We will pursue the design and development of version 2 and later apply Version 2 again to Tranco and also to a set of 10 ccTLDs of Francophonie’s member countries in the South.
  • More applications of MECILDI will eventually occur to better understand the linguistic shape and multilingualism of the Internet and its evolution over time.

Projects by OBDILCI

  • Indicators for the Presence of Languages and multilingualism in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • Web Multilingualism reports
  • Courses
  • AI and Multilingualism
  • Linguistic gTLDs
  • DILINET
  • Pre-historic Projects…
  • Digital Language Death