CERVANTES PUBLISH ITS 2025 REPORT ABOUT SPANISH IN THE WORLD

Welcome to the new Cervantes report “Spanish in the world, 2025” :

We have been critical about the 2024 edition of this report. Our considerations of 2024 have been fully surpassed; this time the report brings original and very interesting data on the Spanish, both demographically and economically and maintains the focus on the digital aspect and especially artificial intelligence. Congrats!

One minor exception remains: unfortunately, Cervantes continues to use W3Techs, and its very biased data, as his source for language data on the Web. This time they mention us in a note that would seem to justify that choice:

“We chose to present the W3Tech statistics because they are automatically obtained from web pages that are considered relevant for their content and functionality: duplicate, redirected or subdomain sites are excluded, as well as those that only show one page of the server, for example. Other sources use different criteria to quantify the weight of languages on the Internet. Thus, the Observatory of Linguistic and Cultural Diversity on the Internet (OBDILCI), based in Nice (France) and linked to the International Organization of La Francophonie and previously to the Latin Union, states that 20% of the pages on the Internet are in English, 19% in Chinese and 7.7% in Spanish. But, at the same time, OBDILCI has proposed a Cyberglobalization Index (CGI) that includes as factors the number of speakers of the languages, the number of countries where they are spoken and the number of speakers connected to the Internet. This complex index quantifies English with 14.13% of the world’s cyber globalization, French with 9.55% and Spanish with 2.09% (OBDILCI, 2024). Speaker figures are taken from Ethnologue and other sources, with a 20% confidence interval.

We must rectify or clarify some elements:

– W3techs does not offer web page data but web site data, and the difference is not a detail. To be precise, W3Techs identifies as the language of the site, the main language detected on the home page.

– The relevance criteria mentioned are technical and implicit in any such work. The relevance that remains to be mentioned and that is most relevant is precisely 1) that the data are extracted from the million most visited websites and 2) that W3Techs attributes only one language per website, and by default in case of multilingual sites, English if present.

These two elements, the second being the most striking, explain the significant biases in these data. In reality, the W3Techs figure for English is the percentage of websites with English as one of the language options and for the other languages, the percentage of websites in that language that do not have English as a language option. The reader interested in more solid data can consult that paper presented at the UNESCO meeting LT4ALL2025 that inventoried all existing studies and analyzed their respective biases, arriving at the evidence that the percentage of English pages should be in the window between 20% and 27%, less than half of the W3Techs figure: Inventory and comparisons of all methods for the measure of languages online and implications on Internet’s lingua franca, LT4ALL2025, Paris-UNESCO, February 24-26, 2025 – Video (English  -10mn), Vidéo (Français – 10mn), PPT

– OBDILCI is not “linked to the International Organization of La Francophonie”. It is true that most of our historical support has come from the French-speaking side, but we have also received significant support from the Portuguese-speaking world, and others from UNESCO and the Organization of Ibero-American States (see https://obdilci.org/about/sponsors/). In any case, we are a civil society organization independent of the entities that provide funds for the projects.

– The cyber-globalization index is an indicator that aims to measure the long-term advantages of languages in the digital world, as distinct from the percentage of pages in each language, and there is no contradiction between the two indicators.