OBDILCI

MAIN PROJECT 2: MECILDI

Our main mission is to  produce indicators of presence of languages and multilingualism in the Internet.

The first main project, started in 2017 and becoming mature in 2022, created a model in capacity to produce indicators for 362 languages.  This model is updated at least once per year.

The second main project (MECILDI), started in 2025, is to provide a program in capacity to measure language presence and multilingualism indicators in any targeted series of websites. This program allows to assess the results of the model and to open new lines of research by application in different series, for instance  targeting specific ccTLDs or the TRANCO list of more visited one million web sites. At difference, to most comparable existing program (such as W3Techs) MECILDI will provide due process to take into account the fact that a web site may hold more than one language, thus removing this huge bias of the other methods documented in this peer reviewed reference.

This section focuses MECILDI. If you are interested in the MODEL switch to MAIN PROJECT 1 : MODEL.

MECILDI: PROJECT SUMMARY

The data obtained from the OBDILCI model are of general relevance with regard to languages on the Internet, as the method does not allow for a targeted analysis of a particular subset, such as a specific country or group of countries.

Furthermore, historical research conducted to develop indicators of linguistic diversity has provided scientific documented evidence against methods proposed by marketing firms, which lack the necessary scientific rigor and whose strong bias in favor of English has fueled and continues to fuel chronic misinformation regarding the space of English on the Web. The most significant biases in these sources stem from their failure to account for the reality of multilingualism on websites (see this article) and, at the same time, obscure the reality of the Web’s strong multilingualism, which is growing rapidly (see this section) thanks to the contributions of artificial intelligence tools.

These circumstances have led OBDILCI to adopt the traditional method used by influential but biased sources: algorithmic language detection directly on a sample of websites presumed to be representative of the entire Web. However, unlike these superficial methods, MECILDI will bring the necessary rigor to the consideration of multilingualism. This ambitious new tool will also enable OBDILCI to broaden its scope of study through targeted analysis of specific segments of the Internet, defined according to geographic or thematic criteria.

The MECILDI program will be capable of scanning a wide range of websites, applying a language detection algorithm—selected for its reliability and coverage—to each one. This tool, combined with a broad range of identification techniques, would make it possible to extract the linguistic distribution of the target audience in percentage terms, as well as other indicators related to multilingualism. Taking into account the multilingual nature of a significant proportion of websites represents a complex technical challenge that is the focus of this project.

Initially, MECILDI will be able to shed light on the actual prevalence of English on the web by using the same technique as W3Techs, but without the significant bias inherent in those data. Subsequently, MECILDI will provide original, targeted results capable of guiding, on a factual basis, digital strategies and public policies for languages and multilingualism in cyberspace, beginning with the linguistic domains of the languages of France.

The project is currently supported by the DGLFLF. This support has enabled the development of an initial, simpler version that focuses on the most common technique for multilingual websites (the hreflang attribute) and relies on data extrapolation. This version is currently being tested and should yield initial results in the coming weeks. Greater support is needed to develop the full version capable of identifying all multilingual techniques on websites and extracting their linguistic distribution—a major technical challenge.

In any case, the method and its results will be detailed in an article published in a peer-reviewed scientific journal. The results will most likely definitively confirm the findings of OBDILCI, which estimate that the percentage of web pages in English globally falls within the 20%–27% range (see the study presented at the UNESCO/LT4ALL meeting in 2025).

Projects by OBDILCI

  • Indicators for the Presence of Languages and multilingualism in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • Web Multilingualism reports
  • Courses
  • AI and Multilingualism
  • Linguistic gTLDs
  • DILINET
  • Pre-historic Projects…
  • Digital Language Death