Photo by SpaceX, CC0.

Wikistats 2 builds on the success of Wikistats, the project started more than 15 years ago by Erik Zachte. Wikistats has been the canonical source of statistics about the reach and impact of the Wikimedia movement for many years. It offered a quantitative mirror to the Wikimedia communities to reflect on their growth, gaps and strategic opportunities. It also provided one of the earliest public data sources for the study of large-scale peer production communities, and as such has been cited nearly a thousand times in the literature.

As detailed in Wikistats 2’s documentation, there are several noticeable changes in the new site’s design, but the biggest changes come on the backend. In this post, we’ll detail what changes you’ll see, and explain how to access the data programmatically.

What’s new? Pretty much everything … but the data!

The data-processing pipeline for the new Wikistats has been rebuilt from scratch. It uses distributed-computing open source technology such as Hadoop, Spark, Sqoop, and Hive to ingest and enhance projects data, and loads a prepared version of the whole history of every projects into Druid, a fast-computing analytics server. Druid then serves sliced and diced subsets of data through the Analytics Query Service, the MediaWiki external API for analytics data.

A brand new front-end has also been designed and built on top of the new API. The dashboard concentrates many information, providing an easy way to overlook any project at a glance. More details can be found in the three sections of the dashboard which are labeled Contributing, Reading and Content. The Contributing section is about edits and editors, the Reading one about visited articles and unique-devices, and the Content contains article-level statistics.

You may notice that the data that exists in Wikistats 2.0 is the same data that existed in Wikistats. For this alpha release, we decided to replicate the existing metrics. In doing so we had two goals in mind: We wanted to test this new dashboard against a time-proofed one, and we also wanted to provide existing Wikistats users with statistics that closely matched those they are familiar with. We succeeded relatively well at replicating the existing statistics.

How to access the data programmatically

You can access the same data that powers the new user-interface by querying a RESTful API. The full documentation is available on this page, but we’ll walk you through some examples.

Let’s get the number of edits made every day in October 2017 for Wikipedia in Spanish:

https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/es.wikipedia.org/all-editor-types/all-page-types/daily/20171001/20171101

There are two parameters in the above URL telling us about editor-types and page-types. The editor-types parameter allows to filter by anonymous users (anonymous), registered users declared as robots (group-bot), registered users not declared as bots but that we suspect are nonetheless (name-bot), and registered users the we think are legitimate humans (user). The page-types parameter is  about content versus non-content pages. Content pages are located in the main namespace, while non-content pages refer to talk pages, and others special namespaces.

A second example: We want to find number of human editors who have made more than 100 edits over the course of a month, each month between January and July 2015 on the Commons project:

https://wikimedia.org/api/rest_v1/metrics/editors/aggregate/commons.wikimedia.org/user/all-page-types/100..-edits/monthly/20170101/20170701

This request introduces a new parameter, named activity-level. It is defined for requests on editors and edited-pages and allows to filter for specific levels of activity (1..4-edits, 5..24-edits, 25-99-edits, 100..-edits, or all-activity-levels for no filtering).

And a last one, just for fun! Let’s say we want to find the number of  pages visited by regular users (not bots) between december 2016 and January 2017 on the English-language edition of Wikipedia. You can see how to add dates below:

https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/en.wikipedia.org/all-access/user/daily/20161201/20170131

That’s it! Please let us know what you like or dislike about the new dashboard, and particularly don’t hesitate to file bugs. This will help us graduate that alpha version to the beta stage.

Joseph Allemandou, Senior Software Engineer, Analytics
Wikimedia Foundation

Related

Read further in the pursuit of knowledge

Community From the archives Offline access Wikipedia

Offline-Pedia converts old televisions into Wikipedia readers

There are villages in the Ecuadorian Andes that are so small you cannot find them on a map. Cajas Juridica is one such place, located just 13km north of the equator. But two engineering students, Joshua Salazar and Jorge Vega, and the staff of Yachay Tech University have figured out a way to give discarded….

Community From the archives Interview Profiles Wikipedia

Meet the scientist working to increase the number of underrepresented scientists and engineers on Wikipedia

By day, Dr. Jess Wade is a physicist best known for her work on “polymer-based, circularly polarising, light-emitting diodes.” But in the evenings (and on the weekends, and as other time permits) Dr. Wade is a strong advocate for increasing diversity and inclusion in STEM subjects, speaking at conferences and starting a campaign on Wikipedia to promote more early-career women….

Community Foundation From the archives Wikipedia

New interaction timeline improves investigation of harassment cases

The new interaction timeline tool is a way to look at two contributors’ editing history—where they have interacted, when, and how often. This can help add clarity when reviewing reports of harassment and abuse, and takes some of the burden off both the people reviewing problems, and the people reporting them.

Help us unlock the world’s knowledge.

As a nonprofit, Wikipedia and our related free knowledge projects are powered primarily through donations.

Donate

Connect —

Stay up-to-date about the Wikimedia Foundation

Get email updates

Subscribe to news about ongoing projects and initiatives.

Contact a human

Questions about the Wikimedia Foundation or our projects? Get in touch with our team.

Photo credits

Falcon_9_first_stage_at_LZ-1(two)

Offline-Pedia-screenshot

University Yachay Tech

CC BY-SA 4.0

17_350-icl-jwade-024

Jess Wade

CC BY-SA 4.0

matthew-henry-86779-unsplash