News

Wikipedia community and Internet Archive partner to fix one million broken links on Wikipedia

Photo by Diego Delso, CC BY-SA 3.0.

The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community have now fixed more than one million broken outbound web links on English Wikipedia. This has been done by the Internet Archive’s monitoring for all new, and edited, outbound links from English Wikipedia for three years and archiving them soon after changes are made to articles.  This combined with the other web archiving projects, means that as pages on the Web become inaccessible, links to archived versions in the Internet Archive’s Wayback Machine can take their place.  This has now been done for the English Wikipedia and more than one million links are now pointing to preserved copies of missing web content.

This story is a testament to the sharing, cooperative nature and resulting benefits of the open world.

What do you do when good web links go bad? If you are a volunteer editor on Wikipedia, you start by writing software to examine every outbound link in English Wikipedia to make sure it is still available via the “live web.” If, for whatever reason, it is no longer good (e.g. if it returns a “404” error code or “Page Not Found”) you check to see if an archived copy of the page is available via the Internet Archive’s Wayback Machine. If it is, you instruct your software to edit the Wikipedia page to point to the archived version, taking care to let users of the link know they will be visiting a version via the Wayback Machine.

That is exactly what Maximilian Doerr and Stephen Balbach have done. As a result of their work, in close collaboration with the non-profit Internet Archive and the Wikimedia Foundation’s Wikipedia Library program and Community Tech team, now more than one million broken links have been repaired. For example, footnote #85 from the article about Easter Island, now links to the Wayback Machine instead of a now-missing page.  Pretty cool, right?

“We are honored to work with the Wikipedia community to help maintain the cultural treasure that is Wikipedia,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive, home of the Wayback Machine. “By editing broken outbound links on English Wikipedia to their archived versions available via the Wayback Machine, we are helping to provide persistent availability to reference information. Links that would have otherwise lead to a virtual dead end.”

“What Max and Stephen have done in partnership with Mark Graham at the Internet Archive is nothing short of critical for Wikipedia’s enduring value as a shared repository of knowledge. Without dependable and persistent links, our articles lose their backbone of reliable sources. It’s amazing what a few people can do when they are motivated by sharing—and preserving—knowledge,” said Jake Orlowitz, head of the Wikipedia Library. “Having the opportunity to contribute something big to the community with a fun task like this is why I am a Wikipedia volunteer and bot operator.  It’s also the reason why I continue to work on this never-ending project, and I’m proud to call myself its lead developer,” said Maximilian, the primary developer and operator of InternetArchiveBot.

So, what is next for this collaboration between Wikipedia and the Internet Archive? Well… there are nearly 300 Wikipedia language editions to rid of broken links. And, we are exploring ways to help make links added to Wikipedia self-healing. It’s a big job and we could use help.

Making the web more reliable… one web page at a time. It’s what we do!

Mark Graham, Director, Wayback Machine Project
Internet Archive

A huge thank you to Kenji Nagahashi, Vinay Goel, John Lekashman, Mark Graham, Maximilian Doerr, Stephen Balbach, the Wikimedia Foundation, Wikipedia community members, and Brewster Kahle.

Related

Read further in the pursuit of knowledge

A bunch of media just entered the public domain. Here’s why that matters.

The black-and-white film flickers as a wealthy young woman, sitting on a bench in a garden, bats her eyes at the gardener’s son she is not allowed to marry. In her self-portrait, an artist wears masculine clothes against a bleak urban backdrop, dark eyes meeting the viewer’s gaze from the shadow of a broad-brimmed riding….

Read more

What are the ten most cited sources on Wikipedia? Let’s ask the data.

A new dataset of fifteen million records documents source usage in Wikipedia by identifier and across languages.

Read more

You can now add automatically generated citations to millions of books on Wikipedia

Wikipedia editors can now draw on WorldCat, the world’s largest database of books, to generate citations on Wikipedia thanks to a collaboration between OCLC (Online Computer Library Center) and the Wikimedia Foundation's Wikipedia Library program.

Read more

Help us unlock the world’s knowledge.

As a nonprofit, Wikipedia and our related free knowledge projects are powered primarily through donations.

Donate now

Contact us

Questions about the Wikimedia Foundation or our projects? Get in touch with our team.
Contact

Photo credits

puente_de_la_margineda_santa_coloma_andorra_2013-12-30_dd_08-10_hdr_edit

photo-1457369804613-52c61a468e7d

The_Long_Room_(6713554905)