Job openings/Summer Community Analytics

From Wikimedia Foundation Governance Wiki
Revision as of 03:55, 1 January 2012 by Killiondude (talk | contribs) (small appearance fixes, page probably needs more <br />s)

Wikimedia Foundation is hiring Summer Community Analytics positions


Develop your big data skills at Wikimedia, the #5 web site.


THE POSITION

The Wikimedia Foundation (the host of Wikipedia and its sister sites) is putting together a summer team of big data analysts to extract useful intelligence from our unique and complex user behavior data.

Your job for the summer will be to work with Foundation staff and Wikimedia community members to produce new useful graphs, lists, and projections that will bring the Wikimedia movement a better understanding of how our community is working and the growth challenges it faces.

Content on Wikipedia and its sister sites is created, edited, and maintained entirely by volunteers. Nearly all of their activity, contribution history, and interaction with each other is publicly searchable, though much of the meaningful data is stored in the form of raw text.

We now have some new analytics tools that make searching the text history much easier. We need analysts who can work with our unusual data to identify important patterns in user behavior and survival.

The analyses that you will conduct will be used as input for community change programs and new feature development decisions. Since the questions that we will be tackling are practical, we are looking for candidates that bring rigorous methods to the analysis of the data and are interested in applied research that produces actionable recommendations. Summary text and/or graphical representations of new findings will be delivered to the Community team weekly.

This is utilitarian work, seeking to provide working data for working community members and Foundation staff -- a very high level of rigor and error checking is required, but this will be a summer break from pure research.


WHO WE ARE LOOKING FOR

  • You love big data and working with advanced new big data tools such as MongoDB, Lucene, Hadoop, HBase and more.
  • You are highly comfortable in SQL and have spent time dealing with relational database optimization.
  • You can set up your own Linux work environment (though you probably won't have to).
  • You're comfortable scripting in Python or another language, and enjoy learning unfamiliar languages and technologies.
  • You have no problem grasping complicated interrelationships in data and you are able to quickly write scripts.
  • You realize that a fast answer is better than no answer, and that this kind of work is inherently iterative and changing.
  • You know what it means to spot check a dataset with the motivation of finding the errors -- even if it's your own and you "know" there aren't any!
  • You are driven by your own intense curiosity and creativity to answer questions about the inner workings of large-scale online communities.
  • You enjoy working in an environment that requires you to adapt to changing circumstances and
  • You are very passionate about the free culture movement in general and the Wikipedia projects in particular.


QUALIFICATIONS

  • Currently pursuing a PhD or MSc in computer science, computational linguistics, statistics, information science or mathematics or another discipline involving large scale data analysis. Truly exceptional and mature undergrads will be considered. (We'll be evaluating candidates with an analytics challenge.)
  • Experience with statistical data analysis such as linear models, multivariate analysis, experimental design, and sampling methods.
  • Excellent implementation skills in Python and at least one other language (Java, PHP).
  • Excellent SQL skills.
  • Experience with NoSQL solutions (Hadoop, HBase, Cassandra or Mongo) is a major plus.
  • Experience with full search text applications such as Lucene and Solr is a major plus.
  • Excellent knowledge of UNIX/Linux or Windows environments.
  • Self-starter: Able to mobilize the resources you need to get things done, including yourself.
  • Being an editor of Wikipedia is a big plus.


COMPENSATION Analysts will be compensated at a rate of $23.00/hour. WMF will further support analysts’ relocation with a onetime $2000 housing stipend, paid upfront. Round-trip airfare to SF will also be provided to those residing outside the Bay Area.


COMMITMENT/REQUIREMENTS Full time (40 hours a week) employment in San Francisco, California. Applicants will only be considered if they are able to commit to the full 3 months, starting June 1st and ending August 31, 2012. WMF will only consider applicants who are eligible to work in the United States.


HOW TO APPLY Please send the following materials to Wikimediawork@wikimedia.org with “Summer Analytics” in the subject line. Please be sure to include all of the following in the body of your email and also attach as a single PDF file:

a) Your CV b) 1-2 page cover letter demonstrating your interest in this summer program and how your current research and experience make you an excellent candidate. If available, also include:

  • A pointer to your website or other online presence
  • Source code of your work, for example Github, your own website, or your StackOverflow profile
  • Your Wikipedia username and a list of most important contributions


APPLICATION DEADLINE Please submit your application by January 15, 2012. Short-listed candidates will be asked to complete a short data analysis task.


ABOUT THE WIKIMEDIA FOUNDATION The Wikimedia Foundation is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property.

The Wikimedia Foundation has ambitious goals of improving editor retention, improving article quality and lowering the barriers to entry. These, and other issues, require sophisticated data analysis.