RFP/XML Dumps Help

From the Wikimedia Foundation
Jump to navigation Jump to search

Statement of Purpose

Wikipedia and its sister projects consist of content that is free to study, share, improve and reuse. One of the ways we make the content available for study and for reuse is by publishing XML dumps of the content and its associated metadata at regular intervals. Like most other tasks carried out by the Wikimedia Foundation, we do this on a lean budget, making the most of our hardware and other resources. The English language Wikipedia is the 800-pound gorilla in the room, taking two weeks to run if nothing breaks, and hitting every edge case in the book. We're looking for a contractor who will assist in the continuing development work of the XML dumps infrastructure, with a special emphasis on the Wikipedia dumps.

Background Information

The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property.

Interested parties should review current documentation on this project.

Scope of Work

Duties include locating current or possible bottlenecks in the dump infrastructure, developing a test suite that covers existing features, designing and implementing data integrity checks, and improving or replacing current methods for parceling out pieces of a run to different hosts. The scripts are written in python and they call various standalone php scripts from MediaWiki, as well as C library routines for manipulation of bzip2 blocks. Our servers are running a mix of Ubuntu 10.04 and 8.04.

Outcome and Performance Standards

You are expected to work about 40 hours a week on average. During at least 10 of these (flexible) hours you are required to be available online for collaboration with the (international) Foundation team. Besides maintaining regular communication with your point of contact, there will be milestone check-ins with the Foundation to discuss progress and activities.

Term of Contract

Your initial contract will be for a duration of 6 months, and will commence as soon as possible. Renegotiation at the termination of the contract is optional.

Payments, Incentives, and Penalties

Rate will be determined by level of experience and expertise.

Contractual Terms and Conditions

Required qualifications

Respondent parties are expected to:

  • Have strong knowledge of python, php & C
  • Have solid experience with production and processing of large datasets
  • Be able to work independently where needed, and can work remotely as part of a globally distributed team
  • Be able to learn quickly
  • Have relevant hands-on experience and eagerness to learn and try new concepts
  • Be comfortable in a highly collaborative, consensus-oriented environment
  • Be a proficient speaker in the English language


  • Prior work experience in QA and/or design and development of test suites is a plus
  • Experience with performance monitoring and testing on linux is a plus
  • Experience with MediaWiki is a plus
  • Understanding of the free culture movement is a plus

The ideal candidate will be creative, highly motivated, and able to operate effectively in multiple cultural contexts.

Points of contact for future correspondence

CT Woo, Director of Technical Operations