The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community have now fixed more than one million broken outbound web links on English Wikipedia. This has been done by the Internet Archive’s monitoring for all new, and edited, outbound links from English Wikipedia for three years and archiving them soon after changes are made to articles. This combined with the other web archiving projects, means that as pages on the Web become inaccessible, links to archived versions in the Internet Archive’s Wayback Machine can take their place. This has now been done for the English Wikipedia and more than one million links are now pointing to preserved copies of missing web content.
This story is a testament to the sharing, cooperative nature and resulting benefits of the open world.
What do you do when good web links go bad? If you are a volunteer editor on Wikipedia, you start by writing software to examine every outbound link in English Wikipedia to make sure it is still available via the “live web.” If, for whatever reason, it is no longer good (e.g. if it returns a “404” error code or “Page Not Found”) you check to see if an archived copy of the page is available via the Internet Archive’s Wayback Machine. If it is, you instruct your software to edit the Wikipedia page to point to the archived version, taking care to let users of the link know they will be visiting a version via the Wayback Machine.
That is exactly what Maximilian Doerr and Stephen Balbach have done. As a result of their work, in close collaboration with the non-profit Internet Archive and the Wikimedia Foundation’s Wikipedia Library program and Community Tech team, now more than one million broken links have been repaired. For example, footnote #85 from the article about Easter Island, now links to the Wayback Machine instead of a now-missing page. Pretty cool, right?
“We are honored to work with the Wikipedia community to help maintain the cultural treasure that is Wikipedia,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive, home of the Wayback Machine. “By editing broken outbound links on English Wikipedia to their archived versions available via the Wayback Machine, we are helping to provide persistent availability to reference information. Links that would have otherwise lead to a virtual dead end.”
“What Max and Stephen have done in partnership with Mark Graham at the Internet Archive is nothing short of critical for Wikipedia’s enduring value as a shared repository of knowledge. Without dependable and persistent links, our articles lose their backbone of reliable sources. It’s amazing what a few people can do when they are motivated by sharing—and preserving—knowledge,” said Jake Orlowitz, head of the Wikipedia Library. “Having the opportunity to contribute something big to the community with a fun task like this is why I am a Wikipedia volunteer and bot operator. It’s also the reason why I continue to work on this never-ending project, and I’m proud to call myself its lead developer,” said Maximilian, the primary developer and operator of InternetArchiveBot.
So, what is next for this collaboration between Wikipedia and the Internet Archive? Well… there are nearly 300 Wikipedia language editions to rid of broken links. And, we are exploring ways to help make links added to Wikipedia self-healing. It’s a big job and we could use help.
Making the web more reliable… one web page at a time. It’s what we do!
Mark Graham, Director, Wayback Machine Project
A huge thank you to Kenji Nagahashi, Vinay Goel, John Lekashman, Mark Graham, Maximilian Doerr, Stephen Balbach, the Wikimedia Foundation, Wikipedia community members, and Brewster Kahle.