Wikipedia's value in the age of generative AI

If there was a generative artificial intelligence system that could, on its own, write all the information contained in Wikipedia, would it be the same as Wikipedia today?

This might seem like a philosophical question, but it’s now a very practical one due to recent advances in generative artificial intelligence and large language models (LLMs). Because of widespread adoption of generative AI technology designed to predict and mimic human responses, it is now possible to nearly effortlessly create text that seems a lot like it came from Wikipedia.

My answer to the question is simple: No — it would not be the same.

The process of freely creating knowledge, of sharing it, and refining it over time, in public and with the help of hundreds of thousands of volunteers, has for 20 years fundamentally shaped Wikipedia and the many other Wikimedia projects. Wikipedia contains trustworthy, reliably sourced knowledge because it is created, debated, and curated by people. It’s also grounded in an open, noncommercial model, which means that Wikipedia is free to access and for sharing and it always will be. And in an internet flooded with machine generated content, this means that Wikipedia becomes even more valuable.

In the past six months, the public has been introduced to dozens of LLMs, trained on vast data sets that can read, summarize, and generate text. Wikipedia is one of the largest open corpuses of information on the internet, with versions in over 300 languages. To date, every LLM is trained on Wikipedia content, and it is almost always the largest source of training data in their data sets.

An obvious thing to do with one of these new systems is to try to generate Wikipedia articles. Of course, people have tried it. And, as I’m sure many readers have experienced firsthand, these attempts highlight many challenges for using LLMs to produce what Wikipedians call knowledge, which is trustworthy, reliably sourced encyclopedic writing and images. Some of these shortcomings include:

Output from LLMs isn’t currently fact checked, and there are already well-publicized instances of people using generative AI to try to do their own jobs. There are tons of low-stakes situations, like prompts for thank you notes, plans for a fun vacation, or outlines to start an essay, where the outputs are helpful and not harmful. However, there are other situations where it’s not so good – like in the instance when an LLM fabricated court cases, and the lawyer who used the answers in a real courtroom was ultimately fined. In another situation, a doctor demonstrated that a generative AI system would give poor diagnoses when provided symptoms from patients seen in an emergency room. Over time, my guess is that these systems will get much better, and become more reliably sourced in a variety of contexts. An exciting possibility is demand for better sourcing will improve access to research and books that can be used online. But getting there will take time, and probably significant pressure from regulators and the public to improve in ways that benefit all people.
LLMs can’t use information they haven’t been trained on to respond to prompts. This means that all the books of the world that aren’t available in full text online, content from pre-internet research, and information in languages other than English, aren’t part of what a typical LLM “knows”. As a result, the data sets used to train LLMs today can amplify existing inequities and bias in many areas – like hiring, medicine, and criminal sentencing. Maybe someday this will change, but we’re pretty far off from being able to freely access and then train LLMs on all the different kinds of information people in every language currently use to write for Wikipedia. And even then, additional work will be needed to mitigate bias.
Finally, it’s been shown that LLMs trained on the output of LLMs become measurably worse, and even forget things they once “knew”, an affliction named “model collapse”. What this means is that for LLMs to be good and to get better, they’ll need a steady supply of original content, written by humans, making Wikipedia and other sources of human generated content even more valuable. It also means the world’s generative AI companies need to figure out how to keep sources of original human content, the most critical element of our information ecosystem, sustainable and growing over time.

These are just some of the problems that need to be solved as internet users explore how LLMs can be used. We believe that internet users will place increasing value on reliable sources of information that have been vetted by people. Wikipedia’s policies and our experiences from more than a decade of using machine learning to support human volunteers offer worthwhile lessons in this future.

Principles for the use of generative AI

Machine generated content and machine learning tools aren’t new to Wikipedia and other Wikimedia projects. At the Wikimedia Foundation, we have developed machine learning and AI tools around the same principles that have made Wikipedia such a useful resource to so many: by centering human-led content moderation and human governance. We continue to experiment with new ways to meet people’s knowledge needs in responsible ways including with generative AI platforms, aiming to bring human contribution and reciprocity to the forefront. Wikipedia editors are in control of all machine generated content − they edit, improve, and audit any work done by AI − and they create policies and structures to govern machine learning tools that are used to generate content for Wikipedia.

These principles can form a good starting point for the use of current and emerging large language models. To start, LLMs should consider how their models support people in three key ways:

Sustainability. Generative AI technology has the potential to negatively impact human motivation to create content. In order to preserve and encourage more people to contribute their knowledge to the commons, LLMs should look to augment and support human participation in growing and creating knowledge. They should not ever impede or replace the human creation of knowledge. This can be done by always keeping humans in the loop and properly crediting their contributions. Not only is continuing to support humans in sharing their knowledge in line with the strategic mission of the Wikimedia movement, but it will be required to continue expanding our overall information ecosystem, which is what creates up-to-date training data that LLMs rely on.
Equity. At their best, LLMs can expand the accessibility of information and offer innovative ways to deliver information to knowledge seekers. To do so, these platforms need to build in checks and balances that do not perpetuate information biases, widen knowledge gaps, continue to erase traditionally-excluded histories and perspectives, or contribute to human rights harms. LLMs should also consider how to identify, address, and correct biases in training data that can produce inaccurate and wildly inequitable results.
Transparency. LLMs and the interfaces to them should allow humans to understand the source of, verify, and correct model outputs. Increased transparency in how outputs are generated can help us understand and then mitigate harmful systemic biases. By allowing users of these systems to assess causes and consequences of bias that may be present in training data or in outputs, creators and users can be part of understanding and the thoughtful application of these tools.

A vision for a trusted future

Human contributions are an essential part of the internet. People are the engine that has driven online growth and expansion, and created an incredible place for learning, for business, and for connecting with others.

Could a generative AI replace Wikipedia? It could try, but it would result in a replacement that no one really wants. There’s nothing inevitable about new technology. Instead, it’s up to us all to choose what is most important. We can prioritize human understanding and contribution of knowledge back to the world – sustainably, equitably, and transparently – as a key goal of generative AI systems, not as an afterthought. This would help mitigate increasing misinformation and hallucinations from LLMs; ensure human creativity is recognized for the knowledge that’s created; and most importantly, it will ensure that LLMs and people alike can continue to rely on an up-to-date, evolving, and trustworthy information ecosystem for the long term.

Selena Deckelmann is Chief Product and Technology Officer at the Wikimedia Foundation.