Hiring a data scientist

Note: this post applies to employers hiring Data Analysts, Data Scientists, Statisticians, Quantitative Analysts, or any one of the dozen more titles used for descriptions of the job of “turning raw data into understanding, insight, and knowledge” (Wickham & Grolemund, 2016), the only differences being the skills and disciplines emphasized.

We recently needed to backfill a data analyst position at the Wikimedia Foundation. If you’ve hired for this type of position in the past, you know that this is no easy task—both for the candidate and the organization doing the hiring.

Based on our successful hiring process, we’d like to share what we learned, and how we drew on existing resources to synthesize a better approach to interviewing and hiring a new member of our team.

Why interviewing a data scientist is hard

It’s really difficult to structure an interview for data scientist positions. In technical interviews, candidates are often asked to recite or invent algorithms on a whiteboard. In data science interviews specifically, candidates are often asked to solve probability puzzles that seem similar to homework sets in an advanced probability theory class. This shows that they can memorize formulas and can figure out the analytical solution to the birthday problem in 5 minutes, but it doesn’t necessary indicate whether they can take raw, messy data, tidy it up, visualize it, glean meaningful insights from it, and communicate an interesting, informative story.

These puzzles, while challenging, often have nothing to do with actual data or the kinds of problems that would be encountered in an actual working environment. It can be both a frustrating experience for candidates and organizations alike—which is why we wanted to think about a better way to hire a data scientist for our team.

We also wanted our process to attract diverse candidates. As Stacy-Marie Ishmael, a John S. Knight Fellow at Stanford University and former Managing Editor for Mobile at BuzzFeed News, put it: “job descriptions matter… and where they’re posted matter[s] even more.”

In this post we will walk you through the way we structured our job description and interview questions, and how we created a task for candidates to complete to assess their problem-solving skills.

How to write a job post that attracts good, diverse candidates

Defining “data scientist”

The most obvious (but sometimes overlooked) issue in hiring a data scientist is figuring out what kind of skillset you’re actually looking for. The term “data scientist” is not standard; different people have different opinions about what the job entails depending on their background.

Jake VanderPlas, a Senior Data Science Fellow at the University of Washington’s eScience institute, describes data science as “an interdisciplinary subject” that “comprises three distinct and overlapping areas: the skills of a statistician who knows how to model and summarize datasets (which are growing ever larger); the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and the domain expertise—what we might think of as ‘classical’ training in a subject—necessary both to formulate the right questions and to put their answers in context.”

That’s more or less the description I personally subscribe to, and the description I’ll be using for the rest of this piece.

How to ensure you’re attracting a diverse group of candidates

Now that you’ve defined “data scientist,” it’s necessary to move onto the next section of your job description: what a person actually will do!

The exact phrasing of job descriptions is important because research in this area has shown that women feel less inclined to respond to “male-sounding” job ads and truly regard “required qualifications” as required qualifications. In a study of gendered wording in job posts by Gaucher et al. (2011), they found that “job advertisements for male-dominated areas employed greater masculine wording than advertisements within female-dominated areas,” and “when job advertisements were constructed to include more masculine than feminine wording, participants perceived more men within these occupations and women found these jobs less appealing.”

We had a job description (J.D.) that was previously used for hiring me, but it wasn’t perfect—it included lines like “Experience contributing to open source projects,” which could result in preference for people who enter and stay in open source movement because they don’t experience the same levels of harassment that others experience, or a preference for people who have the time to contribute to open source projects (which may skew towards a certain type of person.)

We consulted Geek Feminism wiki’s how-to on recruiting and retaining women in tech workplaces and the solutions to reducing male bias in hiring when rewriting the job description so as to not alienate any potential candidates. From that document, we decided to remove an explicit requirement for years of experience and called out specific skills that women are socialized to be comfortable with associating with themselves, adding time management to required skills and placing greater emphasis on collaboration.

Once we finished this draft, we asked for feedback from several colleagues who we knew to be proponents of diversity and intersectionality.

A super important component of this: we did not want to place the burden of diversifying our workforce on the women or people of color in our workplace. Ashe Dryden, an inclusivity activist and an expert on diversity in tech spaces, wrote, “Often the burden of fostering diversity and inclusion falls to marginalized people,” and, “all of this is often done without compensation. People internal to the organization are tasked with these things and expected to do them in addition to the work they’re already performing” We strongly believe that everyone is responsible for this, and much has been written about how the work of “[diversifying a workplace] becomes a second shift, something [members of an underrepresented group] have to do on top of their regular job.” To remedy this, we specified colleagues to give feedback during their office hours, when/if they had time for it (so it wouldn’t negatively affect their work), and only if they actually wanted to help out.

From the feedback, we rephrased some points and included an encouragement for diverse range of applicants (“Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.”). We then felt confident publishing the job description, which our recruiters advertised on services like LinkedIn. In addition, we wanted to advertise the position where DataSci women would congregate, so I reached out to a friend at R-Ladies (a network of women using R) who was happy to let the mailing list know about this job opening.

In short: be proactive, go where people already congregate, and ensure your language in a job post is as inclusive as possible, and you will likely attract a wider pool of potential candidates.

Sample Job Description

You might be asking yourself, “So what did this job description actually look like?” Here it is, with important bits bolded and two italicized notes interjected:

• • •

The Wikimedia Foundation is looking for a pragmatic, detail-oriented Data Analyst to help drive informed product decisions that enable our communities to achieve our Vision: a world in which every single human being can freely share in the sum of all knowledge.

Data Analysts at the Wikimedia Foundation are key members of the Product team who are the experts within the organization on measuring what is going on and using data to inform the decision making process. Their analyses and insights provide a data-driven approach for product owners and managers to envision, scope, and refine features of products and services that hundreds of millions of people use around the world.

You will join the Discovery Department, where we build the anonymous path of discovery to a trusted and relevant source of knowledge. Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.

As a Data Analyst, you will:

Work closely with product managers to build out and maintain detailed on-going analysis of the department’s products, their usage patterns and performance.
Write database queries and code to analyze Wikipedia usage volume, user behaviour and performance data to identify opportunities and areas for improvement.
Collaborate with the other analysts in the department to maintain our department’s dashboards, ensuring they are up-to-date, accurate, fair and focussed representations of the efficacy of the products.
Support product managers through rapidly surfacing positive and adverse data trends, and complete ad hoc analysis support as needed.
Communicate clearly and responsively your findings to a range of departmental, organisational, volunteer and public stakeholders – to inform and educate them.

• • •

Notice the emphasis on collaboration and communication—the social aspect, rather than technical aspect of the job.

• • •

Requirements:

Bachelor’s degree in Statistics, Mathematics, Computer Science or other scientific fields (or equivalent experience).
Experience in an analytical role extracting and surfacing value from quantitative data.
Strong eye for detail and a passion for quickly delivering results for rapid action.
Excellent written, verbal, scientific communication and time management skills.
Comfortable working in a highly collaborative, consensus-oriented environment.
Proficiency with SQL and R or Python.

Pluses:

Familiarity with Bayesian inference, MCMC, and/or machine learning.
Experience editing Wikipedia or with online volunteers.
Familiarity with MediaWiki or other participatory production environments.
Experience with version control and peer code review systems.
Understanding of free culture / free software / open source principles.
Experience with JavaScript.

• • •

Notice how we differentiate between requirements and pluses. Other than SQL and R/Python, we don’t place a lot of emphasis on technologies and specific advanced topics in statistics. We hire knowing that the candidate is able to learn Hive and Hadoop and that they can learn about multilevel models and Bayesian structural time series models if a project requires it.

• • •

Benefits & Perks *

Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, massages, cooking classes and much more
The 401(k) retirement plan offers matched contributions at 4% of annual salary
Flexible and generous time off – vacation, sick and volunteer days
Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
For those emergency moments – long and short term disability, life insurance (2x salary) and an employee assistance program
Telecommuting and flexible work schedules available
Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
Great colleagues – international staff speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

* for benefits eligible staff, benefits may vary by location

• • •

Take-home task

Many engineering and data science jobs require applicants to complete problems on a whiteboard. We decided not to do this. As Tanya Cashorali, the Founder of TCB Analytics, put it: “[Whiteboard testing] adds unnecessary stress to an environment that’s inherently high stress and not particularly relevant to real-world situations.” Instead, we prefer to give candidates a take-home task. This approach gives candidates the opportunity to perform the necessary background research, get acquainted with the data, thoroughly explore the data, and use the tools they are most familiar with to answer questions.

After our candidates passed an initial screening, they were given 48 hours to complete this task, inspired by this task that I had completed during my interview process. The tasks were designed so the candidate would have to:

Develop an understanding and intuition for the provided dataset through exploratory data analysis
Demonstrate critical thinking and creativity
Deal with real world data and answer actual, potentially-open-ended questions
Display knowledge of data visualization fundamentals
Write legible, commented code
Create a reproducible report (e.g. include all code, list all dependencies) with a summary of findings

We recommend designing a task that uses your own data and a question you’ve answered previously, to give candidates an example of their day-to-day work in the future. If your team or organization have worked on a small-scale, data-driven project to answer a particular business question, a good starting point would be to convert that into the take-home task.

• • •

Interview questions

Now that you have your candidates, you have to interview them. This, too, can be tricky—but we wanted to judge each candidate on their merits, so we created a matrix ahead of time that could measure their answers.

One of the things we wanted to emphasize was how our prospective applicants thought about privacy and ethics. From how we handle requests for user data, to our public policy on privacy, our guidelines for ethically researching Wikipedia, and our conditions for research efforts, it is clear that privacy and ethical considerations are really important to the Wikimedia Foundation, and we wanted to ensure that final candidates could both handle the data and the privacy concerns that come with this job.

When we thought about the sorts of questions we’ve been asked in previous interviews and the kinds of topics that were important for us, we devised the following goals:

Assess candidate’s critical thinking and research ethics
Require candidate to interpret, not calculate/generate results
Learn about candidate’s approach to analysis
Gauge candidate’s awareness/knowledge of important concepts in statistics and machine learning

• • •

To that end, I asked the candidates some or all of the following questions within the hour I had with them:

• • •

“What do you think are the most important qualities for a data scientist to have?”
Data Analysis:
1. “What are your first steps when working with a dataset?” (“Exploratory data analysis” is too vague! Inquire about tools they prefer and approaches that have worked for them in the past.)
2. “Describe a data analysis you had the most fun doing. What was the part that you personally found the most exciting?”
3. “Describe a data analysis you found the most frustrating. What were the issues you ran into and how did you deal with them?”
I used this question to assess the candidate’s ability to identify ethics violation in a clear case of scientific misconduct because I wanted to work with someone who understood what was wrong with the case, knew why it was wrong, but also could devise a creative solution that would respect privacy.First, I asked if they’ve heard about the OKCupid fiasco. If they haven’t, I briefly caught them up on the situation, described how answers on OkCupid work (if they didn’t know), and specifically mentioned that the usernames were left in the dataset.
1. “Please discuss the ethical problems with compiling this dataset in the first place and then publicly releasing it.”
2. “You’re an independent, unaffiliated researcher. Maybe you’re a researcher here at the Foundation but you worked on this project in your personal capacity outside of work. Describe the steps you might take to make the individuals in the dataset less easily re-identifiable and the kinds of steps you might take before releasing the dataset.”
Concepts in Statistics:
1. Statistical power, p-value, and effect size is an important trio of concepts in classical statistics that relies on null hypothesis significance testing (NHST). As Andrew Gelman, a professor of Statistics at Columbia University, writes, “naive (or calculating) researchers really do make strong claims based on p-values, claims that can fall apart under theoretical and empirical scrutiny.”I presented outcome of a large sample size (e.g. 10K subjects) A/B test that yielded tiny (e.g. odds ratio of 1.0008) but statistically significant (e.g. p < 0.001) results, and then I asked if we should deploy the change to production. Why or why not?
2. Bootstrapping is a popular and computationally-intensive tool for nontraditional estimation and prediction problems that can’t be solved using classical statistics. While there may be alternative non-parametric solutions to the posed problem, the bootstrap is the simplest and most obvious for the candidate to describe, and we consider it an essential tool in a data scientist’s kit.I asked the candidate how we might approach an A/B test where developed a new metric of success and a similarity measure that we can’t use any of the traditional null hypothesis significance tests for.
3. In statistical models, not satisfying the assumptions can lead the scientist to wrong conclusions by making invalid inferences. It was important for us that the candidate was aware of the assumptions in the most common statistical model and that they understood if/how the hypothetical example violated those assumptions. Furthermore, we wanted to see whether the candidate could offer a more valid alternative from—for example—time series analysis, to account for temporal correlation.“One of the things we’re interested in doing is detecting trends in the usage of our APIs – interfaces we expose to the public so they can search Wikipedia. Say I’ve got this time series of daily API calls in the millions and I fit a simple linear regression model to it and I get a positive slope estimate of 3,000 from which I infer that use of our services is increasing by 3,000 API calls every day. Was this a correct solution to the problem? What did I do wrong? What would you do to answer the same question?”
Concepts in Machine Learning:
1. Model Tuning: Many statistical and machine learning models rely on parameters (and hyperparameters) which must be specified by the user. Sometimes software packages include default values and sometimes those values are calculated from the data using recommended formulas—for example, for a dataset with p features in the example below, m would be √p. A data scientist should not always use the default values and needs to know how parameter tuning (usually via cross-validation) is used to find a custom, optimal value that results in the smallest errors but also avoids overfitting.First, I asked if they knew how a random forest works in general and how its trees are grown. If not, it was not a big deal because I’m not interested in their knowledge of a particular algorithm. I reminded them that at every split the algorithm picks a random subset of m features to decide which predictor to split on, and then I asked what m they’d use.
2. Model Evaluation: It’s not enough to be able to make a predictive model of the data. Whether forecasting or classifying, the analyst needs to be able to assess whether their model is good, how good it is, and what its weaknesses are. In the example below, the classification model might look good overall (because it’s really good at predicting negatives since most of the observations are negatives), but it’s actually terrible at predicting positives! The model learned to maximize its overall accuracy by classifying observations “negative” most of the time.“Let’s say you’ve trained a binary outcome classifier and got the following confusion matrix. This comes out to misclassification rate of 17%, sensitivity of 99%, specificity of 18%, prevalence of 80%, positive predictive value of 83%. Pretend I’m a not-so-technical executive and I don’t know what any of these numbers mean. Is your model good at predicting? What are its pitfalls, if any?”

	Predicted Positive	Predicted Negative
Actual Positive	2K	9K
Actual Negative	500	45K

It worked!

Based on this process, we successfully hired Chelsy Xie—who writes awesome reports, makes fantastic additions to Discovery’s dashboards (like sparklines and full geographic breakdowns), and (most importantly) is super inquisitive and welcomes a challenge (core traits of a great data scientist).

This process was easier, in part, because Chelsy was not the first data scientist hired by the Wikimedia Foundation; our process was informed by having gone through a previous hiring cycle, and we were able to improve during this iteration.

It’s harder for employers who are hiring a data scientist for the first time because they may not have someone on their team who can put together a data scientist-oriented interview process and design an informative analysis task. Feel free to use this guide as a way to navigate the process for the first time, or for improving your existing process.

This isn’t the only way to interview a candidate for a data scientist position, nor is it the best way. Much of our thinking on how to approach this task was shaped by our own frustrations as applicants, as well as our experience of what data scientists actually do in the workforce. These insights likely also apply to hiring pipelines in other technical disciplines.

We are also interested in continually improving and iterating on this process. If you have additional tips or would like to share best practices from your own data scientist hiring practices, please share them.

References

Wickham, H. A., & Grolemund, G. (2016). R for Data Science. O’Reilly Media. Retrieved from http://r4ds.had.co.nz/ on 9 January 2017.
VanderPlas, J. (2016). Python Data Science Handbook. O’Reilly Media. Retrieved from https://github.com/jakevdp/PythonDataScienceHandbook on 9 January 2017.
Gaucher, D., Friesen, J., & Kay, A. C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology, 101(1), 109–128. http://doi.org/10.1037/a0022530
Dryden, A. (2014). The Responsibility of “Diversity”. Retrieved from https://www.ashedryden.com/blog/the-responsibility-of-diversity on 17 January 2017.
Cashorali, T. (2017). How to Hire and Test for Data Skills: a One-Size-Fits-All Interview Kit. TCB Analytics Blog. Retrieved from http://tcbanalytics.com/blog/testing-data-skills.html on 9 January 2017.