March 15, 2018

What Kind of Data Scientist Are You?

Alex Woodie

(Syda Productions/Shutterstock)

If you’ve worked with the data science community, you’ve probably interacted with data scientists and formed a definition for the increasingly popular position. But it turns out, not all data scientists are alike, and according to a recent analysis by researchers at UCLA and Microsoft, there are actually nine different types of data scientists.

Miryung Kim, an associate professor in UCLA’s Computer Science Department, last week presented a session at the Strata Data Conference that showcased her research into the data science and software development community. The research revolved around a survey of 793 professional data scientists working at Microsoft that investigated how they spent their time, what tools they use, and the challenges they face in their jobs.

Kim and her team ran the results of the survey through a clustering algorithm (naturally) and published the results last September in a 17-page paper titled “Data Scientists in Software Teams: State of the Art and Challenges,” that can be downloaded from the IEEE Xplore Digital Library.

The first thing Kim and her colleagues discovered was that not all people practicing data science call themselves “data scientists.” Nearly 40% of the survey respondents identified as data scientists, but 24% called themselves software engineers, 18% were software engineers, while 20% had some other title. All told, Kim concluded 532 could be considered to be data scientists.

Experience and education levels also varied. About one-third had bachelor’s degrees, while 22% had PhDs and 41% had master’s degrees. The average experience level was 13.6 years, with an average of about 10 years spent analyzing data.

(Source: “Data Scientists in Software Teams: State of the Art and Challenges” September 2017, Kim et al.)

The clustering algorithm highlighted patterns in how data science practitioners spend their time. Based on the predominant activity of a group, Kim and her team came up with a name that defined that group.

The results showed nine different kinds of data scientist, including:

Data Preparer: This type of data scientist spends an average of 25% of their time querying data, and about 20% actually preparing data for analysis. Data Preparers are more likely to work with SQL and less likely to work with machine learning algorithms.
Data Shaper: The Data Shaper shares many of the skills of the Data Preparer, but brings additional expertise, such as machine learning expertise and experience with tools like MATLAB and Python. They’re also more likely to have a PhD and less likely to work with SQL or structured data.
The Data Analyzer: Data scientists who spend more than half their time analyzing data could fall into this bucket. Other traits of Data Analyzers include more experience with classical statistics, math, and data manipulations, and a predilection for using R.
Platform Builder: You might be a Platform Builder if you spend about half of your time building platforms and instrumenting code for the purpose of collecting data. Platform Builders are more likely to work in distributed systems, like Hadoop, and have “engineer” in their title, but not to have a PhD.
Data Evangelist: This type of data scientists spends a good portion of her time engaging with others. They’re more likely to work with line-of-business decision makers and those in product development than the group as a whole, and less likely to work with SQL or structured data.
Insight Actor: This data science type spends nearly 60% of her time acting on insight, and nearly 20% disseminating insights from the data. This is a relatively small group, percentage-wise, but it was statistically significant.
50% Moonlighter: Sometimes, you might be a data scientist but not even know it. Software engineers and program managers who spend half their time using data science-related skills and the other half doing something else fall into this category.
20% Moonlighter: Engineers and managers who only dabble in data science (i.e. spend 20% of their time doing it) fall into this category.
Polymath: This is the “jack of all trades” type of data scientist who spends his time doing all sorts of data-oriented tasks, from building platforms to gather data to analyzing data and acting on it too. Polymath’s are more likely to have a PhD, more likely to use Python, and more likely to use Bayesian-style Monte Carlo statistics than the group as a whole.

“What is really interesting to me,” Kim said, “is while we think of data science a buzzword, when we look at the…data we saw very different characteristics of different groups of data scientists who have very different kinds of work activities.”

The biggest challenges reported by data scientists may ring a bell to those who have worked in data science. The challenges were gropued into three main categories, including data, analysis, and people.

Miryung Kim is an Associate Professor in UCLA’s Computer Science Department, where she heads up the Software Engineering and Analysis Laboratory.

On the data front, poor data quality was one of the most commonly reported problems. “Some respondents mentioned that there is an expectation that it is a data scientist’s job to correct data quality issues, even though they are the main consumers of data,” the report states.

Data availability, including missing values and the inability to tap legacy systems for data collection, was also cited as a major challenge. Data integration, including the merging of different streams of data into a single data set for analysis, remains a bugaboo for data scientists around the world.

Scale was the biggest problem related to analysis (which is probably while some still refer to it as “big data”). Survey respondents reported that it can sometimes take too long to collect and analyze the data, whether it’s on Hadoop or Cosmos, Microsoft’s version of the big distributed storage and processing framework.

On the personnel side of the data science equation (a factor too often overlooked in many human endeavors), the UCLA researcher identified one major impediment to data science success: communicating what insights the data science team has discovered. Staying up-to-date on changing tools and technologies is another concern.

Standards Effort Seeks to Redefine ‘Data Scientist’

Microsoft Readies Major Push Into Big Data

Applications: Data Mining

Technologies: Frameworks

Sectors: Academia

Vendors: Microsoft

Tags: big data, data preparation, data science, Data Scientists, machine learning

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

What Kind of Data Scientist Are You?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

What Kind of Data Scientist Are You?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link