RFP/Lucene Search Operations Engineer

From the Wikimedia Foundation
Jump to navigation Jump to search

Statement of Purpose

Wikipedia and its sister projects consist of content that is free to study, share, improve and reuse. One of the ways we make the content readily available and searchable is by indexing changed contents at regular intervals and make them available to our search engines. The Wikimedia Foundation deploys Lucene as the Search Engine backbone for its Wikimedia projects. We're looking for a consultant who will assist in the continuing development and operational work of the Search software stack and the infrastructure. The candidate is a subject matter expert on Lucene Search technology and will provide guidance to the Foundation Technical Operations team on maintaining, improving and migrating the Search infrastructure. Documentation on current deployment can be found at - http://wikitech.wikimedia.org/view/Search.

Background Information

The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property.

Scope of Work

Work on the enhancement and the daily operational matters such as improving efficiency, capacity and redundancy of the Lucene Search infrastructure

  • Help in troubleshooting unexpected outages and identifying operational issues
  • Profile and locate performance bottlenecks
  • Make use of Puppet as the Configuration management tool in maintaining the manifest for the Lucene configuration
  • Deploy Lucene Search infrastructure at our new data center.

Upgrade and migrate current Search software stack to work with the latest Lucene version

  • Upgrade to current new release of Lucene
  • Develop and upgrade Mediawiki search extensions (MWSearch and Lucene-search) to work with the new Lucene release. MWSearch extension is a MediaWiki backend to fetch search results from MediaWiki Lucene-based search engine. Lucene-search extends the Apache Lucene search API to rank pages based on number of backlinks, distributed searching and indexing, parsing of wiki text, incremental updates, etc.
  • Automate, optimize and document the indexing and deployment process.

Outcome and Performance Standards

You are expected to work about 40 hours a week on average. While it is based on flexible hours, you are expected to be available online for collaboration with the (international) Foundation team during agreed upon scheduled time. Besides maintaining regular communication with your point of contact, there will be milestone check-ins with the Foundation to discuss progress and activities.

Term of Contract

Your initial contract will be for a duration of 6 months, and will commence as soon as possible. Renegotiation at the termination of the contract is optional.

Payments, Incentives, and Penalties

Rate will be determined by level of experience and expertise.

Contractual Terms and Conditions

Required qualifications
Respondent parties are expected to:

  • Have strong knowledge of Lucene, Java, Php and Linux
  • Experience with configuration management systems and concepts (e.g. puppet, chef, cfengine)
  • Experience with operating system distribution packaging systems (e.g. dpkg, RPM)
  • Have solid experience with production and processing of large datasets
  • Be able to work independently where needed, and can work remotely as part of a globally distributed team
  • Have relevant hands-on experience and eagerness to learn and try new concepts
  • Be comfortable in a highly collaborative, consensus-oriented environment
  • Be a proficient speaker in the English language


  • Prior work experience implementing Lucene / Solr Search engines
  • Experience with performance monitoring and testing on Lucene
  • Experience with MediaWiki is a plus
  • Understanding of the free culture movement is a plus

The ideal candidate will be creative, highly motivated, and able to operate effectively in multiple cultural contexts.

Points of contact for future correspondence

CT Woo, Director of Technical Operations