Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Leverage Big Data'14

February 12, 2013

Could the Data Scientist Be a Bad Thing for Big Data?

I was recently discussing a market-leading cable company’s revised compensation program with a segment manager there. Strategic changes and new product directions at the company were driving revised marketing measurements.

In this case, marketing’s new plan was tied to ROMI (Return on Marketing Investment) and the manager was understandably concerned about campaign effectiveness.“I don’t want to be deciding what offers to send a customer while some data scientist is running an analytical model using different data” he said. “Trouble is I’m never really sure who’s doing what with the data. And if my customer list is skewed my bonus is toast.” 

Business people are increasingly accustomed to analyzing large amounts of complex data in their workaday jobs.  As more of them adopt analytics, they recognize the big data trend as the natural evolution.  Likewise, statisticians—once relegated to the dark hallways of banks and insurance companies—have emerged to find managers across industries newly engaged in conversations about how to mine big data to drive business value.  

The Data Scientist Defined

With big data, data analysis is assuming new dimensions. Take the statistician’s algorithmic prowess, combine it with new data volumes in more complex formats and fold in practical knowledge of how the business uses data to make decisions, and you have the data scientist.

Data scientists—dubbed “The Sexiest Job of the 21st Century” by authors Tom Davenport and D.J. Patil—don’t just run mathematical models against diverse data sets. They may be called upon to suggest how to leverage the data to improve cross-selling techniques, optimize product pricing, prevent fraudulent transactions, and predict a customer’s next likely purchase. But some experts have included data management practices as diverse as data correction, semantic reconciliation, data dictionary maintenance, data visualization, and customer lifetime value modeling in the data scientist’s bag of tricks.

“Data scientists, by definition, combine business acumen with data acumen,” explains P.K. Kannan, Marketing Department Chair at University of Maryland’s Smith Business School. “Data scientists have insight into the firm’s products and services while simultaneously possessing mastery of both data creation and data analysis. In that sense, they’re different from traditional statisticians not only in their business domain knowledge but also in terms of a broader scope of work.” 

In Kannan’s abovementioned “broader scope” lies the slippery slope. Various definitions of data scientists include a complex and diverse range of skills, from data integration to in-depth knowledge of business programs, to relationship-building and liaison skills. The expectation that the data scientist—let alone any business professional—can perform such a broad array of activities could jeopardize nascent big data efforts. (Indeed, many are already at-risk.)

In reality a company may involve a variety of data-focused experts charged with accessing, defining, cleansing, integrating, and deploying cross-functional business information in the context of customer relationship management, risk analysis, or data warehousing programs. Data administrators, data stewards, business architects, data quality administrators, data analysts, solutions architects, metadata managers, and other roles could, if not well-defined, overlap—or worse, compete—with the data scientist’s.  Simply put, it’s more complex than a single function or role. (For fun I wrote a blog post on being a data scientist’s girlfriend.) The long-suffered corporate conundrum of the same question with different answers is reflected in millions of financial spreadsheets, and will only be exacerbated as big data projects gain adoption.

Big Data Wins: New Processes, Skills, and Tools

Executives would do well to get out in front of big data efforts and clarify roles. Conducting an inventory of data professionals and their responsibilities is an effective first step in establishing clarity and decision rights. Managers should work with HR to formally identify, create, and document data management roles and establish handoff points between them.  The resulting precision not only minimizes duplication of effort, it can delineate clear data ownership boundaries and prevent over-investment.

Role clarity for managing data (however big) is only part of the answer. More formal rules of engagement between data analysts and knowledge workers can drive efficiencies that can in turn decrease the time-to-value for business decisions.   After all, employees who use the data on a regular basis will always outnumber those who manage it. Like my cable company friend, they are increasingly being measured on their ability to make fact-based decisions based on meaningful information.

The manager understood that until the data was authoritative customer-facing decisions would be error-prone. The importance of accurate customer addresses and enriched customer profiles meant adopting new data profiling and correction processes. Reconciling customer identities from heterogeneous systems was critical, as was refining customer profiles with unstructured data. Once the cable company formalized these new processes and adopted data quality and master data management, their campaign response rates rocketed to 23 percent. (Customer satisfaction scores rose apace.) All this was done not by but on behalf of data scientists, who used the new capabilities to run new marketing analytics and test the results.

Indeed, technology can play a part in clarifying big data deployment tasks. Project management and workflow solutions that automate the routing of work tasks and pinpoint bottlenecks can streamline a company’s data supply chain. Likewise, data quality and master data management software can highlight data anomalies, standardize data rules and definitions and automate what are often manual and duplicate activities.

When it comes to big data, conversations around revolutionary breakthroughs often usurp the more prosaic discussions around its mechanics. The business leader’s new challenge is in communicating often-cryptic work efforts involved in managing and integrating data while emphasizing their importance.  “The term ‘data science’ has actually been around since the year 2000,” says Kannan. “But the role of data scientist is quite recent, and it’s going to be much more important in the future.” If the importance of data scientists is growing with the advent of big data, the sooner we understand what exactly it is they do, the better.

About the Author

Jill Dyché is a Vice President at SAS and the author of three books on the business value of technology. Reach her at

Share Options


» Subscribe to our weekly e-newsletter


There is 1 discussion item posted.

Great article Jill
Submitted by ShawnRog on Feb 15, 2013 @ 11:53 AM EST

I agree with the definition of roles discussion especially including HR and planning to avoid duplication of work. I used to think that data scientists were a data or business analyst on steroids but as the function has evolved and more companies are challenged with selecting individuals with the right mix of skills I've come to the conclusion that the a data scientist is generally several people with a combined agenda that covers the business aspects as well as the data challenges to deliver data scientist value to an organization.

I enjoyed the article!

Post #1


Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia


Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014

» View/Search Events

» Post an Event