News

A new platform to explore statistics about Wikimedia projects

Photo by SpaceX, CC0.

Wikistats 2 builds on the success of Wikistats, the project started more than 15 years ago by Erik Zachte. Wikistats has been the canonical source of statistics about the reach and impact of the Wikimedia movement for many years. It offered a quantitative mirror to the Wikimedia communities to reflect on their growth, gaps and strategic opportunities. It also provided one of the earliest public data sources for the study of large-scale peer production communities, and as such has been cited nearly a thousand times in the literature.

As detailed in Wikistats 2’s documentation, there are several noticeable changes in the new site’s design, but the biggest changes come on the backend. In this post, we’ll detail what changes you’ll see, and explain how to access the data programmatically.

What’s new? Pretty much everything … but the data!

The data-processing pipeline for the new Wikistats has been rebuilt from scratch. It uses distributed-computing open source technology such as Hadoop, Spark, Sqoop, and Hive to ingest and enhance projects data, and loads a prepared version of the whole history of every projects into Druid, a fast-computing analytics server. Druid then serves sliced and diced subsets of data through the Analytics Query Service, the MediaWiki external API for analytics data.

A brand new front-end has also been designed and built on top of the new API. The dashboard concentrates many information, providing an easy way to overlook any project at a glance. More details can be found in the three sections of the dashboard which are labeled Contributing, Reading and Content. The Contributing section is about edits and editors, the Reading one about visited articles and unique-devices, and the Content contains article-level statistics.

You may notice that the data that exists in Wikistats 2.0 is the same data that existed in Wikistats. For this alpha release, we decided to replicate the existing metrics. In doing so we had two goals in mind: We wanted to test this new dashboard against a time-proofed one, and we also wanted to provide existing Wikistats users with statistics that closely matched those they are familiar with. We succeeded relatively well at replicating the existing statistics.

How to access the data programmatically

You can access the same data that powers the new user-interface by querying a RESTful API. The full documentation is available on this page, but we’ll walk you through some examples.

Let’s get the number of edits made every day in October 2017 for Wikipedia in Spanish:

https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/es.wikipedia.org/all-editor-types/all-page-types/daily/20171001/20171101

There are two parameters in the above URL telling us about editor-types and page-types. The editor-types parameter allows to filter by anonymous users (anonymous), registered users declared as robots (group-bot), registered users not declared as bots but that we suspect are nonetheless (name-bot), and registered users the we think are legitimate humans (user). The page-types parameter is  about content versus non-content pages. Content pages are located in the main namespace, while non-content pages refer to talk pages, and others special namespaces.

A second example: We want to find number of human editors who have made more than 100 edits over the course of a month, each month between January and July 2015 on the Commons project:

https://wikimedia.org/api/rest_v1/metrics/editors/aggregate/commons.wikimedia.org/user/all-page-types/100..-edits/monthly/20170101/20170701

This request introduces a new parameter, named activity-level. It is defined for requests on editors and edited-pages and allows to filter for specific levels of activity (1..4-edits, 5..24-edits, 25-99-edits, 100..-edits, or all-activity-levels for no filtering).

And a last one, just for fun! Let’s say we want to find the number of  pages visited by regular users (not bots) between december 2016 and January 2017 on the English-language edition of Wikipedia. You can see how to add dates below:

https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/en.wikipedia.org/all-access/user/daily/20161201/20170131

That’s it! Please let us know what you like or dislike about the new dashboard, and particularly don’t hesitate to file bugs. This will help us graduate that alpha version to the beta stage.

Joseph Allemandou, Senior Software Engineer, Analytics
Wikimedia Foundation

Related

Read further in the pursuit of knowledge

Wikipedia’s value in the age of generative AI

If there was a generative artificial intelligence system that could, on its own, write all the information contained in Wikipedia, would it be the same as Wikipedia today?

Read more

Experience some of the world’s most beautiful places with Wiki Loves Earth 2022

Think about that feeling you get when you are walking through pristine nature. Do you think it can be captured in a single photo?  The Wiki Loves Earth photography competition celebrates the world’s natural heritage, encompassing everything from small parks to massive nature reserves. It asks photographers to dive into those areas to pick out….

Read more

New Wikipedia editor features make it easy for everyone to contribute

New features on Wikipedia are making it easy for everyone to edit Wikipedia, especially those contributing to the site for the first time. Every time you read a Wikipedia article, you are reading the work of a volunteer contributor.  Nearly 300,000 people from around the world edit Wikipedia articles each month — they start new….

Read more

Help us unlock the world’s knowledge.

As a nonprofit, Wikipedia and our related free knowledge projects are powered primarily through donations.

Donate now

Contact us

Questions about the Wikimedia Foundation or our projects? Get in touch with our team.
Contact

Photo credits