Follow Datanami:
October 15, 2015

Are There Different ‘Flavors’ of Data Scientist?

Henrik Nordmark

(Lassedesignen/Shutterstock)

 

It has been said that we are all “greater than the sum of our parts,” In my view, this maxim is very fitting for data scientists. Here’s why.

Data scientists are the sum of several different academic skill sets: part-computer scientist, part-mathematician, and specialists in particularly fields–for example, ecology. When combined these skills create individuals capable of uncovering information hidden in data to create solutions for organizations.

However, how each data scientist approaches these problems has a lot to do with the skills that make them up. For businesses, this is no small matter. Creating the right balance of data scientists in a team will create an environment that allows problems to be approached from different angles and will have healthy academic debates that sparks innovation. So how do you know what “type” of data scientists you have?

First, the methodologies that make up a data scientist are not consistent. A practitioner can be more of a mathematician than a computer scientist and vice versa.

In general, a “statistician” data scientist will tend to worry more about error terms and emphasize the use of statistical models to describe and predict. They are also usually slightly less familiar with machine learning (although that is changing rapidly). This is in contrast to a data scientist that comes from a computer science background. They tend to think more about how to query and transform data efficiently.

To add to this complication, these subjects can also be further sub-divided into a number of different specialist areas, or arguably, subjects in their own right.

For example, statistics can be sub-divided into specialist areas such as classical frequentist statistics, Bayesian statistics, and non-parametric statistics.

In practice, this can mean problems are approached in different ways. Frequentists tend to make a lot of hidden assumptions about the nature of data they are dealing with (e.g. the data is normally distributed) and they are very focused on unbiased estimators. Whereas, Bayesians explicitly make assumptions about what they believe they might get to see in their data and then they update their beliefs once the data has arrived. Finally, non-parametric statisticians tend to make no assumptions about the nature of their data.

Add to this the background of the individual data scientist–from physics to ecology to psychology–and inspiration on how problems can be tackled is drawn from almost all disciplines in which data is used.

Without getting bogged down in the nitty-gritty of how each will practically approach data, it’s sufficient to say that they may reach slightly different conclusions based on the same information.

This is not as scary as it first sounds. As the statistician George Box once wrote, “All models are wrong, but some are useful.” Data science is the quest to translate a question into something that could be answered using data and then the application a variety of techniques to see what happens to drive an organization forward.

Although it is great to have a team of data scientists with different backgrounds so that different perspectives and approaches can flourish when tackling a business problem, the most important question is not what academic background your data scientist has, but whether he or she has the imagination to apply a variety of different techniques from different fields in novel contexts to answer a question.

This dialogue of ideas and methods is what will bring to surface the most useful methods to solve a problem and generate insight. Or put it another way, the more creative your data scientists are and the more willing they are to look at the world in different ways, the better the results will be for your company.

 

About the author: Henrik is Head of Data Science at Profusion and Visiting Henrik NordmarkFellow at the University of Essex in the UK. At Profusion, he leads a team of data scientists, data analysts, data architects and interns working on applying statistics and machine learning in novel ways to solve business problems. Henrik is also responsible for research and development at Profusion. He has established close ties with the University of Essex and spearheaded the creation of a government sponsored Knowledge Transfer Partnership to develop cutting edge data science techniques by drawing from ideas and methods in artificial intelligence, statistical learning, operational research and economics. Henrik is passionate about allowing data to reveal itself so it can inform our decisions and make us curious about hidden relationships that we may not have initially considered.

(feature art credit:  lassedesignen/Shutterstock.com)

Related Items:

The Future of Data Science

What Color Is Your Data? Inside the Science of Data Visualization

Businesses Are Going About Data Science Wrong–Here’s How To Get It Right

 

 

 

Datanami