Follow Datanami:
November 28, 2011

Making Sport of Data Science

Datanami Staff

Sports and data have always gone hand in hand, now more so than ever thanks to advanced analytics powering major league decisions and predictions. However, the sport of data science can be found in another area—among the data scientists themselves.

Kaggle, a freshly-funded company that offers a platform for data scientists to compete to solve real-world complex data mining and analytics problems, has received a great deal of press attention due its competition that push problematic issues for companies, and gather data scientists to solve those problems for big money.

Below is a video of the CEO of the company, Anthony Goldbloom, talking to a representative from Microsoft’s Business Intelligence unit, about what competitions, even within organizations, can yield. He claims that by putting the best minds into a problem under the guise of a technology challenge (not to mention one that can garner an award—monetary or otherwise) crowdsourcing tough analytics questions can yield the most innovative solutions.

The interview focuses on the example of an upcoming $3 million award for the scientists that are able to cull together the best algorithm for mining health care claims data. Participants will use the data to create models that predict which patients out of the 700,000 on Heritage Provider Network’s list are most likely to end up in the hospital. Using doctor visit, drug, test and other medical data, data scientists that are effectively able to make such predictions could save the company millions.

As Goldbloom explains, predictive analytics for healthcare mean that the company could “flag” patients as being top risks for expensive, long hospital stays. To counter these expenses, these flagged individuals could be targeted for preventative care measures. For instance, instead of just knowing that Patient X might end up in the hospital for a month later this year, it might be cheaper for Heritage to send a nurse each week to ensure the patient is taking his or her medications and following the recommended route to healthier living.

Goldbloom claims that executives at companies can learn from the crowdsourcing approach to creating advanced analytics to power business efficiency.  As he noted during the above interview, “Competitions are a really good way of improving on existing models. We’ve hosted a large number of competitions and have never failed to make the benchmark set by an in-house modeling team or group of academic researchers.”

Goldbloom says that one of the reasons his competitions are so successful is that when a problem is being solved by an established set of statisticians and data scientists within an organization, they tend to apply their own set of techniques and ideas. With a platform like Kaggle, however, new ideas, often from data scientists that have no preconceived notions about the problem itself (only the solution) can use their own solution-focused techniques.

Turning data science into a sport and placing the emphasis on innovation and new methods of problem solving might alter the way businesses look at the traditional problem of harnessing analytics to solve big issues. It could allow data scientists within a company the freedom to think outside of the box, work both independently and within the team or organization, and to inspire them (cha-ching) to create new methods of analyzing and thinking about data within the company.