AN INTERESTING AI EXPERIMENT POST MECILDI

AN INTERESTING AI EXPERIMENT POST MECILDI

After the announcement of MECILDI and its preprint publication, OBDILCI conducted an experiment with eleven AI applications. Several objectives were drawn for that experiment:

  1. Check in what proportion the progresses realized by OBDILCI in the theme of measuring languages presence online are reflected in AI’s answers.
  2. Assess the potential impact of MECILDI in that field by asking IAs to react to the preprint and its potential impact.
  3. Try to “educate” scientifically  the AIs in a field dominated by an extremely popular, yet strongly biased data provider.
  4. Assess the possibility that the improvement made by AIs in a conversation could impact future conversations with other users.
  5. Assess the impact of the conversation language in the answers.

The results of that experiment are documented here with links to every specific conversation.

There is a large level of convergence in the answers of the AIs, within a split in 3 categories according to the fact that OBDILCI’s data could be mentioned in the first answer, the second one or in the third one, only after explicit mention.

The choice of language has no effect in that experiment except in two AIs. One case, DeepSeek, showing a tremendous impact in what could be called a macro hallucination triggered by a massive bias.

THE CONCLUSIONS

  1. The progresses are well reflected, in half of the case from scratch, however in 25% of the AIs, not until the reference to OBDILCI is explicitly made.
  2. There is a consensus that MECILDI is a game changer, one AI even call it a chess mate.
  3. The education process works fine by opposing peer reviewed data to undocumented one.
  4. All the AIs but one pretended they will answer totally differently to the same question starting immediately. Test proved this is false. There are still some gray areas in this question, which appears as a crucial issue for the future of AI.
  5. Can we reason with an AI using solid arguments and make it change a view based on popular consensus to one based on sound scientific principles, even if less popular? The answer is yes, without a doubt, and thankfully so. This is why using AI correctly requires a good dose of critical thinking.
  6. The change of language did not modify the answers, except for CoPilot and DeepSeek. The singular case of DeepSeek, and the massive anti-Francophonie bias which emerged, motivates the following additional questions, some remain unanswered.
  7. Can the creators of an AI intentionally bias its responses in a specific direction? Yes, by feeding it with data that heavily contain this bias.
  8. Do the Chinese authorities have a well-established anti-Francophone stance?
  9. Did this anti-Francophonie bias in DeepSeek emerged in that experience as a hallucinatory accident, or is it « programmed » to express itself systematically?