Follow Datanami:
September 4, 2013

Data Science Has Got Talent as Facebook Launches Competition

Isaac Lopez

So you think you’ve got the talent to be a Facebook data scientist? Hold the resume – you won’t need it. Facebook is now moving their data science human resource search to Kaggle, the online data science competition portal.

For those who aren’t familiar with Kaggle, it essentially turns data science into a competitive sport – we previously likened it to a “Tour De France” of data science. Coders are given partial data sets, and are asked to use their predictive modeling chops to fill in the blanks providing the best predictive algorithms in the pack. Competitions can last months, with daily leader boards reflecting who is closest to the mark.

Last Friday, Facebook launched a competition on Kaggle, with the winning participants receiving consideration for an interview with Facebook for job openings in Menlo Park, Seattle, New York City, and London.

From the competition details:

“This competition tests your text skills on a large dataset from the Stack Exchange sites.  The task is to predict the tags (a.k.a. keywords, topics, summaries), given only the question text and its title. The dataset contains content from disparate stack exchange sites, containing a mix of both technical and non-technical questions.”

Leaderboard on 9/4/2013

Contestants are given two data sets, Train and Test. The Train file contains four columns with ID, Title, Body and Tags – all of which pertain to questions from the Stack Exchange question-and-answer site. The Test file contains the same columns minus the tags column. It is up to the competing data scientist to come up with an algorithm that will predict what values belong in that column.

Don’t expect to use performance enhancing data for the competition, though! Facebook says they’ll be looking at the code to verify the legitimacy of the contestants approach, and explicitly forbid crawling the Stack Exchange sites to look up answers, or contestants sharing code with others in the competition.

The competition, which began on Friday, August 30th will run for a total of 112 total days, ending on Friday, December 20, 2013. There are currently 40 individuals participating in the challenge to become the Facebook Data Science Idol.

Already a leader board has shaped up, and as of news time, the leading participant, dubbed “Naïve Baseline,” has achieved a mean score of 0.61029 out of the gates. The next highest score is Alec Radford with 0.54894. Over the next few months, those mean scores will race towards perfection, and a shot at a position at Facebook.

This is becoming a trend in hiring for the social network giant, who says this is the third competition of this type as they search for the best talent in data science and software engineering.

Related items:

Facebook Advances Giraph With Major Code Injection 

Paving the Yellow Brick Road to Behavioral Analytics 

The Five Types of Hadoop Data