Big Data • Big Analytics • Big Insight

February 12, 2013

Could the Data Scientist Be a Bad Thing for Big Data?

Jill Dyché

I was recently discussing a market-leading cable company’s revised compensation program with a segment manager there. Strategic changes and new product directions at the company were driving revised marketing measurements.

In this case, marketing’s new plan was tied to ROMI (Return on Marketing Investment) and the manager was understandably concerned about campaign effectiveness.“I don’t want to be deciding what offers to send a customer while some data scientist is running an analytical model using different data” he said. “Trouble is I’m never really sure who’s doing what with the data. And if my customer list is skewed my bonus is toast.” 

Business people are increasingly accustomed to analyzing large amounts of complex data in their workaday jobs.  As more of them adopt analytics, they recognize the big data trend as the natural evolution.  Likewise, statisticians—once relegated to the dark hallways of banks and insurance companies—have emerged to find managers across industries newly engaged in conversations about how to mine big data to drive business value.  

The Data Scientist Defined

With big data, data analysis is assuming new dimensions. Take the statistician’s algorithmic prowess, combine it with new data volumes in more complex formats and fold in practical knowledge of how the business uses data to make decisions, and you have the data scientist.

Data scientists—dubbed “The Sexiest Job of the 21st Century” by authors Tom Davenport and D.J. Patil—don’t just run mathematical models against diverse data sets. They may be called upon to suggest how to leverage the data to improve cross-selling techniques, optimize product pricing, prevent fraudulent transactions, and predict a customer’s next likely purchase. But some experts have included data management practices as diverse as data correction, semantic reconciliation, data dictionary maintenance, data visualization, and customer lifetime value modeling in the data scientist’s bag of tricks.

“Data scientists, by definition, combine business acumen with data acumen,” explains P.K. Kannan, Marketing Department Chair at University of Maryland’s Smith Business School. “Data scientists have insight into the firm’s products and services while simultaneously possessing mastery of both data creation and data analysis. In that sense, they’re different from traditional statisticians not only in their business domain knowledge but also in terms of a broader scope of work.” 

In Kannan’s abovementioned “broader scope” lies the slippery slope. Various definitions of data scientists include a complex and diverse range of skills, from data integration to in-depth knowledge of business programs, to relationship-building and liaison skills. The expectation that the data scientist—let alone any business professional—can perform such a broad array of activities could jeopardize nascent big data efforts. (Indeed, many are already at-risk.)

In reality a company may involve a variety of data-focused experts charged with accessing, defining, cleansing, integrating, and deploying cross-functional business information in the context of customer relationship management, risk analysis, or data warehousing programs. Data administrators, data stewards, business architects, data quality administrators, data analysts, solutions architects, metadata managers, and other roles could, if not well-defined, overlap—or worse, compete—with the data scientist’s.  Simply put, it’s more complex than a single function or role. (For fun I wrote a blog post on being a data scientist’s girlfriend.) The long-suffered corporate conundrum of the same question with different answers is reflected in millions of financial spreadsheets, and will only be exacerbated as big data projects gain adoption.

Big Data Wins: New Processes, Skills, and Tools

Executives would do well to get out in front of big data efforts and clarify roles. Conducting an inventory of data professionals and their responsibilities is an effective first step in establishing clarity and decision rights. Managers should work with HR to formally identify, create, and document data management roles and establish handoff points between them.  The resulting precision not only minimizes duplication of effort, it can delineate clear data ownership boundaries and prevent over-investment.

Role clarity for managing data (however big) is only part of the answer. More formal rules of engagement between data analysts and knowledge workers can drive efficiencies that can in turn decrease the time-to-value for business decisions.   After all, employees who use the data on a regular basis will always outnumber those who manage it. Like my cable company friend, they are increasingly being measured on their ability to make fact-based decisions based on meaningful information.

The manager understood that until the data was authoritative customer-facing decisions would be error-prone. The importance of accurate customer addresses and enriched customer profiles meant adopting new data profiling and correction processes. Reconciling customer identities from heterogeneous systems was critical, as was refining customer profiles with unstructured data. Once the cable company formalized these new processes and adopted data quality and master data management, their campaign response rates rocketed to 23 percent. (Customer satisfaction scores rose apace.) All this was done not by but on behalf of data scientists, who used the new capabilities to run new marketing analytics and test the results.

Indeed, technology can play a part in clarifying big data deployment tasks. Project management and workflow solutions that automate the routing of work tasks and pinpoint bottlenecks can streamline a company’s data supply chain. Likewise, data quality and master data management software can highlight data anomalies, standardize data rules and definitions and automate what are often manual and duplicate activities.

When it comes to big data, conversations around revolutionary breakthroughs often usurp the more prosaic discussions around its mechanics. The business leader’s new challenge is in communicating often-cryptic work efforts involved in managing and integrating data while emphasizing their importance.  “The term ‘data science’ has actually been around since the year 2000,” says Kannan. “But the role of data scientist is quite recent, and it’s going to be much more important in the future.” If the importance of data scientists is growing with the advent of big data, the sooner we understand what exactly it is they do, the better.

About the Author

Jill Dyché is a Vice President at SAS and the author of three books on the business value of technology. Reach her at Jill.Dyche@sas.com.