OTHER PROJECTS

WWW Multilingualism Reports

WWW Multilingualism reports

OBDILCI is pleased to announce a new series of studies about the state of multilingualism of  the Internet, sustained by data from https://Dataprovider.com.

Their database gathers over 200 parameters about web sites, covering nearly the entire WWW ecosystem (800+ million records, including 167 million active websites and 91 million with relevant language information available).  Among these parameters, some pertain to languages and others can be cross-referenced with language data.

FIRST REPORT

This first report updates the study made by Ionan University in 2009 on the prevalence of English within European Union ccTLD web sites, showing notable evolution (English drops from 28% to 20%). It allows a first estimation of the global rate of multilingualism of the WWW and produces interesting correlations between the main language of the sites and ecommerce, type of TLD, economic activity and more.

Some intuitive insights, and a few surprises, are now backed by strong data:

  • more than half Ukrainian, Estonian, Catalan or Greek web sites are multilingual;
  • correlation between high economic footprint sites and multilingualism is impressive;
  • Chinese, Korean and Japanese could do better in terms of percentage of multilingual sites;
  • same for Portuguese which is just below average, at difference from Italian, French or Spanish;
  • .org is not as multilingual as one would expect…

With our multilingual gracias to Dataprovider.com for the courtesy access to this wealth of data.

SECOND REPORT

Second report of the series WebMultilingualism: Exploring web presence and multilingualism of European minority languages with associated gTLDs.

The conclusion of the study are :

  • Two gTLDs show excellent linguistic performances, both in terms of presence of the minority language and in terms of multilingualism: .cat and .eus; both, specially .eus, have room to increase their presence.
  • Two gTLDs, .cymru and .gal, show fair linguistic performance, together with .lu. The case of .lu, the ccTLD of Luxemburg, is to be treated separately, the study highlighting the low presence of the national language in a context of strong multilingualism.
  • Three gTLDs show medium results with promises in some of the factors and difficulties in others: .gal, .gales and .bzh.
  • Two gTLDs, .corsica and .frl has not reached the point of beneficiating the local language nor multilingualism, and seems to remain in a stage of geographic TLD.
  • Finally, .alsace, .irish and .scot do not show linguistic performance and remain geographic domain with limited penetration and linguistic impact at this stage.

THIRD REPORT

We are pleased to announce the third study on multilingualism conducted using the DataProvider.com database. This time, the study is in French, for a change, and the topic is: A characterization of the French-speaking Web based on a series of parameters, in comparison with other dominant languages ​​on the Internet. T

he focus is on French, but the study also provides data for other languages, and you can view the data for the language of your choice, among the 19 specifically covered.

The fundamental discovery is that each language has its own “thematic signature,” reflecting the culture it embodies, and you can even see the similarities and distances between linguistic Webs.

Thus, the French-speaking Web is close to the Italian Web and extremely distant from the Hindi Web. In addition to the thematic analysis, the following parameters are addressed: economic impact, trust in e-commerce, average site size, average number of inbound links, business orientation and B2B vs. B2C.

The report presents the method, biases and results and allows a better understanding of the nature of each linguistic web, developing the French-speaking web as an example. In passing, a comparison is made between the respective proportions of languages ​​in the DataProvider.com database and the data from the OBDILCI model, allowing to establish the favorable or unfavorable biases towards certain languages.

FOURTH REPORT

Here comes the fourth study on Web multilingualism conducted using the DataProvider.com database. This time, the study is in Spanish, and the topic is: Una evaluación de la reciprocidad en el uso mutuo del español y del portugués en las Web lusófona e hispanófona (An evaluation of reciprocity in the mutual use of Spanish and Portuguese in the Web of Spanish and Portuguese speaking countries).

The main findings of the study are :

  • The rate of multilingualism of websites in Portuguese is below the world average.
  • There are 3 times more web sites in Spanish in Portuguese speaking countries than Portuguese in Spanish speaking countries. In any case the percentages are very low, below 1% in average.
  • Brazil is the country with higher percentage of web sites with Spanish version.
  • Mercosur and Caribbean countries are the Spanish speaking countries with the higher percentage of websites with Portuguese version.
  • The prevalence of websites with Portuguese version in USA is 15 times higher than Spanish’ if rates in proportion of speaker’s population.

Recommendations are :

  • Web sites in Portuguese should increase their multilingualism
  • Spain websites should increase the percentage of Portuguese versions
  • Spanish speakers in the USA should increase their virtual presence

FIFTH REPORT

This report in English addresses: Web Multilingualism analyzed by ccTLD, languages, gTLDs and much more: winners and losers

Summary of findings:

REPORT #6

Wikimedia Multilingualism. This report makes use of the outstanding Wikimedia language’s stats to explore the most multilingual application of the Web and show the languages which perform most in each of the element and globally. Additionally, it exposes the numerous online encyclopedias existing.

Projects by OBDILCI

  • Indicators for the Presence of Language in the Internet
  • The Languages of France in the Internet
  • French in the Internet
  • Portuguese in the Internet
  • Spanish in the Internet
  • AI and Multilingualism
  • Digital Languages Death
  • DILINET
  • Pre-historic Projects…