Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


February 25, 2013

Data Science and the Decision-maker in the Machine


The conflation between popular data-related buzz terms and actual “data science” could be problematic if not straightened-out soon, argue academics, Foster Provost and Tom Fawcett in a recent article.

In the article, Provost and Fawcett express concern about data science being intricately intertwined with other important data related concepts of growing importance (I.e., big data, and data-driven decision making), and the threat that poses to the burgeoning field of data science.

“Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot – even “sexy” – career choice,” say the authors.  “However there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz.”

The authors argue that it’s not the algorithms or techniques that comprise data science, but the core principles that underlie the techniques.  “In order for data science to serve business effectively, it is important (i) to understand the relationships to these other important and closely related concepts, and (ii) to begin to understand what are the fundamental principles underlining data science,” say the authors. 

Rather, say Provost and Fawcett, data science should be seen as the connective tissue between data-processing technologies (including those for “big data”) and data-driven decision making.  “Data science involves much more than just data-mining algorithms,” say Provost and Fawcett.  Instead, the authors argue, data science involves principles, processes, and techniques for understanding phenomenon via the (automated) analysis of data, with the ultimate goal being the improvement of decision making – specifically “Data-Driven Decision-making” (DDD).

“Data-driven decision making refers to the practice of basing decision on the analysis of data rather than purely on intuition,” say the authors.  “The benefits of data-driven decision making have been demonstrated conclusively.”

The authors cite a study conducted by economist Erik Brynjolfsson and his colleagues from MIT and Penn’s Wharton School on how DDD affect firm performance.  Using detailed survey data on the business practices and information technology investments of 179 large publicly traded firms, the study concludes that firms that adopt DDD have output and productivity that is 5-6% higher than what would be expected given their other investments and information technology usage.

Provost and Fawcett warn, readers shouldn’t lose sight of the fact that despite the impression that one might get there is a lot to data processing that is not “data science.”  The authors define “big data” to mean “datasets that are too large for traditional data-processing systems and that therefore require new technologies” such as Hadoop, Hbase, CouchDB, etc.  They further note that Economist Prassanna Tambe of New York University’s Stern School has found that the use of big data technologies correlates with significant additional productivity growth.

“Specifically, one standard deviation higher utilization of big data technologies is associated with 1-3% higher productivity than the average firm,” write Provost and Fawcett.  “One standard deviation lower in terms of big data utilization is associated with 1-3% lower productivity.”

This is important to note, say the authors, because they believe that industry followers should expect a Big Data 2.0 phase to follow Big Data 1.0.  Once companies are capable of flexibly processing massive data, business managers will start asking “What can I now do that I couldn’t do before, or do better than I could do before?”  This paradigm shift, say the authors, will likely usher in a golden era of data science in which the principles and techniques of data science are applied more broadly and deeply than ever before.

In ten years-time, argue Foster and Provost, the predominant technologies will likely have changed or advanced enough that today’s choices would seem quaint.  The authors highlight the fact that increasingly, business decisions are being made automatically by computer systems, with some of the early purveyors of large-scale data (finance and telecommunications) being the early adopters of automatic decision-making. They point toward a day when a chief scientist in a data-science-oriented company will do much less data processing and more data analytics design and interpretation.

Data science, conclude the authors, supports data-driven decision making – and sometimes allows making decisions automatically at a massive scale.  Thus, the authors argue that it’s important to identify the fundamental principles that underlie data science that have both theoretical and empirical backing.

‘The principles of data science are its own and should be considered and discussed explicitly in order for data science to realize its potential,” conclude the Foster and Provost.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

ISC'14

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event