Aspen
Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report HPCwire Japan


February 12, 2013

Could the Data Scientist Be a Bad Thing for Big Data?


I was recently discussing a market-leading cable company’s revised compensation program with a segment manager there. Strategic changes and new product directions at the company were driving revised marketing measurements.

In this case, marketing’s new plan was tied to ROMI (Return on Marketing Investment) and the manager was understandably concerned about campaign effectiveness.“I don’t want to be deciding what offers to send a customer while some data scientist is running an analytical model using different data” he said. “Trouble is I’m never really sure who’s doing what with the data. And if my customer list is skewed my bonus is toast.” 

Business people are increasingly accustomed to analyzing large amounts of complex data in their workaday jobs.  As more of them adopt analytics, they recognize the big data trend as the natural evolution.  Likewise, statisticians—once relegated to the dark hallways of banks and insurance companies—have emerged to find managers across industries newly engaged in conversations about how to mine big data to drive business value.  

The Data Scientist Defined

With big data, data analysis is assuming new dimensions. Take the statistician’s algorithmic prowess, combine it with new data volumes in more complex formats and fold in practical knowledge of how the business uses data to make decisions, and you have the data scientist.

Data scientists—dubbed “The Sexiest Job of the 21st Century” by authors Tom Davenport and D.J. Patil—don’t just run mathematical models against diverse data sets. They may be called upon to suggest how to leverage the data to improve cross-selling techniques, optimize product pricing, prevent fraudulent transactions, and predict a customer’s next likely purchase. But some experts have included data management practices as diverse as data correction, semantic reconciliation, data dictionary maintenance, data visualization, and customer lifetime value modeling in the data scientist’s bag of tricks.

“Data scientists, by definition, combine business acumen with data acumen,” explains P.K. Kannan, Marketing Department Chair at University of Maryland’s Smith Business School. “Data scientists have insight into the firm’s products and services while simultaneously possessing mastery of both data creation and data analysis. In that sense, they’re different from traditional statisticians not only in their business domain knowledge but also in terms of a broader scope of work.” 

In Kannan’s abovementioned “broader scope” lies the slippery slope. Various definitions of data scientists include a complex and diverse range of skills, from data integration to in-depth knowledge of business programs, to relationship-building and liaison skills. The expectation that the data scientist—let alone any business professional—can perform such a broad array of activities could jeopardize nascent big data efforts. (Indeed, many are already at-risk.)

In reality a company may involve a variety of data-focused experts charged with accessing, defining, cleansing, integrating, and deploying cross-functional business information in the context of customer relationship management, risk analysis, or data warehousing programs. Data administrators, data stewards, business architects, data quality administrators, data analysts, solutions architects, metadata managers, and other roles could, if not well-defined, overlap—or worse, compete—with the data scientist’s.  Simply put, it’s more complex than a single function or role. (For fun I wrote a blog post on being a data scientist’s girlfriend.) The long-suffered corporate conundrum of the same question with different answers is reflected in millions of financial spreadsheets, and will only be exacerbated as big data projects gain adoption.

Big Data Wins: New Processes, Skills, and Tools

Executives would do well to get out in front of big data efforts and clarify roles. Conducting an inventory of data professionals and their responsibilities is an effective first step in establishing clarity and decision rights. Managers should work with HR to formally identify, create, and document data management roles and establish handoff points between them.  The resulting precision not only minimizes duplication of effort, it can delineate clear data ownership boundaries and prevent over-investment.

Role clarity for managing data (however big) is only part of the answer. More formal rules of engagement between data analysts and knowledge workers can drive efficiencies that can in turn decrease the time-to-value for business decisions.   After all, employees who use the data on a regular basis will always outnumber those who manage it. Like my cable company friend, they are increasingly being measured on their ability to make fact-based decisions based on meaningful information.


The manager understood that until the data was authoritative customer-facing decisions would be error-prone. The importance of accurate customer addresses and enriched customer profiles meant adopting new data profiling and correction processes. Reconciling customer identities from heterogeneous systems was critical, as was refining customer profiles with unstructured data. Once the cable company formalized these new processes and adopted data quality and master data management, their campaign response rates rocketed to 23 percent. (Customer satisfaction scores rose apace.) All this was done not by but on behalf of data scientists, who used the new capabilities to run new marketing analytics and test the results.

Indeed, technology can play a part in clarifying big data deployment tasks. Project management and workflow solutions that automate the routing of work tasks and pinpoint bottlenecks can streamline a company’s data supply chain. Likewise, data quality and master data management software can highlight data anomalies, standardize data rules and definitions and automate what are often manual and duplicate activities.

When it comes to big data, conversations around revolutionary breakthroughs often usurp the more prosaic discussions around its mechanics. The business leader’s new challenge is in communicating often-cryptic work efforts involved in managing and integrating data while emphasizing their importance.  “The term ‘data science’ has actually been around since the year 2000,” says Kannan. “But the role of data scientist is quite recent, and it’s going to be much more important in the future.” If the importance of data scientists is growing with the advent of big data, the sooner we understand what exactly it is they do, the better.

About the Author

Jill Dyché is a Vice President at SAS and the author of three books on the business value of technology. Reach her at Jill.Dyche@sas.com.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There is 1 discussion item posted.

Great article Jill
Submitted by ShawnRog on Feb 15, 2013 @ 11:53 AM EST


I agree with the definition of roles discussion especially including HR and planning to avoid duplication of work. I used to think that data scientists were a data or business analyst on steroids but as the function has evolved and more companies are challenged with selecting individuals with the right mix of skills I've come to the conclusion that the a data scientist is generally several people with a combined agenda that covers the business aspects as well as the data challenges to deliver data scientist value to an organization.

I enjoyed the article!

Post #1

 

Most Read Features

Most Read News

Most Read This Just In

SGI DataRaptor with MarkLogic Database

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

HPCwire Live! Atlanta's Big Data Kick Off Week Meets HPC: What does the future holds for HPC?

Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

June 27-28, 2013
Hot Storage '13
San Jose, CA
United States

July 17-18, 2013
Big Data Security Conference
Boston, MA
United States

September 9-9, 2013
10th Annual HPC for Wall Street
New York City, NY
United States

» View/Search Events

» Post an Event