
OBDILCI
MAIN PROJECT 2: MECILDI

MAIN PROJECT 2: MECILDI
Our main mission is to produce indicators of presence of languages and multilingualism in the Internet.
The first main project, started in 2017 and becoming mature in 2022, created a model in capacity to produce indicators for 362 languages. This model is updated at least once per year.
The second main project (MECILDI), started in 2025, is to provide a program in capacity to measure language presence and multilingualism indicators in any targeted series of websites. This program allows to assess the results of the model and to open new lines of research by application in different series, for instance targeting specific ccTLDs or the TRANCO list of more visited one million web sites. At difference, to most comparable existing program (such as W3Techs’) MECILDI will provide due process to take into account the fact that a web site may hold more than one language, thus removing this huge bias of the other methods documented in this peer reviewed reference.
This section focuses MECILDI. If you are interested in the MODEL switch to MAIN PROJECT 1 : MODEL.
MECILDI: PROJECT SUMMARY
The data obtained from the OBDILCI model are of general relevance with regard to languages on the Internet, as the method does not allow for a targeted analysis of a particular subset, such as a specific country or group of countries.
Furthermore, historical research conducted to develop indicators of linguistic diversity has provided scientific documented evidence against methods proposed by marketing firms, which lack the necessary scientific rigor and whose strong bias in favor of English has fueled and continues to fuel chronic misinformation regarding the space of English on the Web. The most significant biases in these sources stem from their failure to account for the reality of multilingualism on websites (see this article) and, at the same time, obscure the reality of the Web’s strong multilingualism, which is growing rapidly (see this section) thanks to the contributions of artificial intelligence tools.
These circumstances have led OBDILCI to adopt the traditional method used by influential but biased sources: algorithmic language detection directly on a sample of websites presumed to be representative of the entire Web. However, unlike these superficial methods, MECILDI will bring the necessary rigor to the consideration of multilingualism. This ambitious new tool will also enable OBDILCI to broaden its scope of study through targeted analysis of specific segments of the Internet, defined according to geographic or thematic criteria.
The MECILDI program will be capable of scanning a wide range of websites, applying a language detection algorithm—selected for its reliability and coverage—to each one. This tool, combined with a broad range of identification techniques, would make it possible to extract the linguistic distribution of the target audience in percentage terms, as well as other indicators related to multilingualism. Taking into account the multilingual nature of a significant proportion of websites represents a complex technical challenge that is the focus of this project.
Initially, MECILDI will be able to shed light on the actual prevalence of English on the web by using the same technique as W3Techs, but without the significant bias inherent in those data. Subsequently, MECILDI will provide original, targeted results capable of guiding, on a factual basis, digital strategies and public policies for languages and multilingualism in cyberspace, beginning with the linguistic domains of the languages of France.
The project is currently supported by the DGLFLF. This support has enabled the development of an initial and simpler version that focuses on the most common technique for multilingual websites (the hreflang attribute) and relies on data extrapolation. This version is currently being tested and should yield initial results in the coming weeks. Greater support is needed to develop the full version capable of identifying all multilingual techniques on websites and extracting their linguistic distribution—a major technical challenge.
In any case, the method and its results will be detailed in an article published in a peer-reviewed scientific journal. The results will most likely definitively confirm the findings of OBDILCI, which estimate that the percentage of web pages in English globally falls within the 20%–27% range (see the study presented at the UNESCO/LT4ALL meeting in 2025).
APRIL 2025 : MECILDI V1 is now developed, tested and operational.
A series of run has been established in order, at the same time, to check and approve the method and the program, and to provide relevant data on the use of one million most visited web sites to estimate the proportion of languages in the whole web.
- RUN 1 : 5/4/2026 applied on TRANCO series of 11/2025
- RUN 1.1: 7/4/2025 the same with correction of an error on the percentage of websites having an English version (57.9%). Percentage of web pages in English = 22.1% ; rate of multilingualism = 3 ; percentage of multilingual web sites = 33.8% ; average number of language per multilingual web site = 7 ; percentage of sites using Google Translate imbedded = 1.2%
- FACTOR SENSITIVITY ANALYSIS : 8/4/2026 . The main bias of the method is the extrapolation factor used to project the full results. a) an heuristic confirms the choice of 40% as the correct basis b) modelling changes of this value in a wide range confirm English percentage remains in side the 20% – 27% window. Other factors impact on the results are marginal.
- RUN 2: 11/4/2026 applied on TRANCO series of 4/4/2026 confirms and gives trust to the main results. Not much differences on the main indicators and main languages (often within confidence interval). Most differences happened logically for the error rates and the less dominant languages. Trend for English slightly to the down (56%/21.8% vs.58%/22.1%)
- RUN 3: May 13, 2026—a final test is conducted to validate the statistical approach. A new randomly generated set of 100 x 1000 sites is submitted. 97.8% of the new results fall within the confidence interval of the initial results, and for the 5 results out of 240 that show a greater difference, this remains marginal (0.05%). This final test confirms the statistical approach and concludes the measurement campaign.
| % OF WEBPAGES IN | VALUE IN TRANCO SERIES | CONFIDENCE INTERVAL 99% (+-) |
| English | 21,77% | 0,79% |
| German | 6,93% | 0,24% |
| French | 6,38% | 0,24% |
| Spanish | 6,36% | 0,22% |
| Italian | 4,13% | 0,16% |
| Portuguese | 3,86% | 0,15% |
| Russian | 3,86% | 0,16% |
| Dutch | 3,17% | 0,13% |
| Japanese | 2,93% | 0,11% |
| Chinese | 2,77% | 0,13% |
| Polish | 2,57% | 0,10% |
| Indonesian | 1,79% | 0,10% |
| Turkish | 1,76% | 0,10% |
| Swedish | 1,74% | 0,11% |
| Korean | 1,62% | 0,09% |
| Arabic | 1,60% | 0,10% |
| Czech | 1,52% | 0,09% |
| Danish | 1,41% | 0,10% |
| Finnish | 1,28% | 0,10% |
| Romanian | 1,26% | 0,08% |
| Ukrainian | 1,24% | 0,10% |
| Hungarian | 1,23% | 0,08% |
| Modern Greek | 1,10% | 0,08% |
| Vietnamese | 1,09% | 0,07% |
This table accounts for languages in the million most visited websites based on TRANCO series. This series relies on sources (Majestic, QuantCast, Cisco Umbrella) which are strongly biased in favor of main occidental countries, therefore biasing positively main European languages (German, French, Spanish…). Those figures does not reflect the reality of language proportion in the whole www where non-European languages percentages, in particular Chinese’s, would be considerably higher.
In any case, the differences between those figures and those of W3Techs (computed in the same series) are the consequences of the fact that W3Techs does not account for the multilingualism of web sites and count a unique language per site where we count all linguistic versions.
The details for the more than 200 languages included in the TRANCO study are openly accessible (CC-BY-SA 4.0) here below. Also some figures about the state of web multilingualism.
The study on the multilingualism of some the gTLDs of France have been concluded with MECILDI ad the results can be consulted below.
At the bottom, technical information for webmasters to check how MECILDI robot respects the explored web sites.


Projects by OBDILCI
- Indicators for the Presence of Languages and multilingualism in the Internet
- The Languages of France in the Internet
- French in the Internet
- Portuguese in the Internet
- Spanish in the Internet
- Web Multilingualism reports
- Courses
- AI and Multilingualism
- Linguistic gTLDs
- DILINET
- Pre-historic Projects…
- Digital Language Death
