What galleries, libraries, archives, and museums can teach us about multimedia metadata on Wikimedia Commons

Wikimedia Commons is one of the world’s largest free-licensed media repositories with over 40 million image, audio, and video files. But the MediaWiki software platform that Commons is built on was designed for text, not rich media. This creates challenges for everyone who uses Wikimedia Commons: media contributors, volunteer curators, and those who use media hosted on Commons—on Wikimedia projects and beyond.

One of the main challenges for people who want to contribute media to Commons, or find things that are already there, is the lack of consistent metadata—information about a media file such as who created it, what it is, where it’s from, what it shows, and how it relates to the rest of the files in Commons’ massive archive.

The Structured Data on Commons (SDC) program aims to address this metadata problem by creating a more consistent, structured way of entering and retrieving the important metadata. This structured data functionality, based on the same technology that powers WikiData, will allow people to describe media files in greater detail, find relevant content more easily, and keep track of what happens to a piece of media after it’s uploaded.

One of the challenges SDC faces in this massive and ambitious redesign project is prioritization: What problems are we trying to solve? Who experiences those problems? Which ones should we tackle first? How can we avoid breaking other things in the process?

To answer these questions, we have been performing user research with different kinds of Commons participants, starting with GLAM projects—Galleries, Libraries, Archives, and Museums. We chose GLAM as the initial user research focus for a few reasons:

GLAM projects upload massive amounts of high-quality media
The people who upload media for GLAMs have different levels of previous experience with editing Wikimedia projects, ranging from complete newbies to veteran editors, and
The types of files being uploaded by GLAM projects, and the amount and kind of metadata available about those files, are similarly diverse

In other words, the motivations, needs, and workflows of GLAM project participants are diverse enough to potentially apply to many other kinds of people who contribute and consume media on Commons every day. Improving the Commons experience for GLAMs will likely benefit other users as well.

Our research into how to support GLAM projects during the transition to structured data on Commons began with a workshop in February 2017 at the European GLAM coordinators meeting. Between July and October 2017 we interviewed a dozen GLAM project participants from Africa, the Americas, Europe, and South Asia and ran surveys of GLAM participants and Commons editors.

We organized findings from the workshop, interviews, and surveys into five themes. Each theme represents a set of challenges and opportunities related to the way GLAM projects currently interact with Wikimedia Commons:

Preserving important metadata about media items
Functionality and usability of upload tools
Monitoring activity and impact after upload
Preparing media items for upload
Working with Wikimedia and Wikimedians

Through our research we were able to document a rich diversity of roles, goals, tools, and activities across GLAM projects, as well as identify motivations, unmet needs, and pain points that many GLAMs have in common. We also observed a number of inventive workarounds that GLAM project participants used to capture important metadata that wasn’t easy to record using the current systems like categories and templates. These workarounds illustrate the importance of metadata for making a file or a collection findable, useful, and usable, and the need for better ways to record all of that vital contextual information.

In the Structured Data on Commons program, we also regularly consult with Wikimedia Commons editors about how structured data will impact their work. With input from the Commons community, we developed a prioritized list of important community-developed tools for organizing media on Commons, which also helps us to understand typical workflows and to prioritize functionalities.

Research findings and community feedback will be combined into personas, journey maps, and user stories to help product teams set development priorities and define requirements for improving Commons file pages, upload tools, and search interfaces that use structured data.

A full report of our GLAM interview and survey research is available on the Research portal on meta.wikimedia.org, along with slides and a video of a recent presentation of findings.

The next steps of this project include additional interviews with Commons editors to understand how structured data will impact ongoing curation activities. We are also interested in speaking with re-users of Commons media outside of the Wikimedia movement to learn how structured data can make Commons an even more valuable global resource for high-quality free-licenced media—get in touch with us at jmorgan[at]wikimedia[dot]org and sfauconnier[at]wikimedia[dot]org.

Jonathan T. Morgan, Senior Design Researcher
Sandra Fauconnier, Community Liaison
Wikimedia Foundation