November 3, 2015

Exposing the Data Scientist Myth: Using Big Data Without Them

Jelani Harper

Many organizations assume that big data initiatives require mythical data scientists. This notion is partly propagated by the media scrutiny surrounding the data scientist shortage and by the numerous responsibilities of this profession, which include data preparation, cognizance of business problems and analytics, among others. Harvard Business Review seemingly solidified this perception with its announcement that data scientists had the ‘sexiest’ job of the 21^st century.

However, a less acknowledged (yet perhaps more pervasive) reality has quietly emerged within the wake of the aforementioned hype. The self-service movement has flourished throughout some of the most vital aspects of the data landscape and concentrated on big data. There are numerous situations and examples in which data scientists are not required to implement big data staples of data preparation, analytics and data governance.

The self-service movement has not replaced the need for data scientists, but instead created a new reality in which the business and end users are able to seize control of big data and begin leveraging it without them.

Analytic Insight

The self-service era has perhaps gained the most traction—and success in automating components of data science—in the realm of analytics. There are multiple self-service options organizations can utilize to gain analytic insight on big data sets via the cloud, which offers the additional cost advantages of decreased physical infrastructure and price plans based on actual usage. In most instances, organizations must simply migrate their data to the cloud to perform any host of analytic options (descriptive, prescriptive, predictive and diagnostic), which makes this method viable for even small and mid-sized businesses.

Competitive vendors offer assortments of business intelligence tools, plug-ins for popular applications such as CRM, and a variety of visualizations and dashboards for publishing. Some cloud-based analytics providers even perform analysis for their customers, who merely identify business objectives or specific queries and simply wait for the results courtesy of sophisticated graph analysis.

Certain graph technologies can also help end users issue sophisticated queries without writing code due to their visual representation of data objects. “Without writing any SQL you can just click on the screen, specify a query as a sub-graph, and then the system will write the query and figure it out,” Franz CEO Jans Aasman noted about graph-based queries. Significantly, many service providers in the analytics space either employ or are operated by data scientists, which implies that one of the ways the self-service movement is impacting these professionals is by shifting their employment from the enterprise to cloud vendors.

Data Modeling

One of the critical aspects of big data analysis that has traditionally been associated with data scientists is the modeling process, which exists at the nexus between analytics and data preparation.

Typically, big data modeling is extremely time consuming and can delay time to insight. Nonetheless, machine learning algorithms can substantially hasten this process by providing future models based on current data and their uses.

Organizations can leverage machine learning technologies from service providers specializing in Machine Learning-as-a-Service, or from cloud vendors who focus on an assortment of data discovery and predictive modeling capabilities. Competitive vendors provide models and analytics results with Natural Language Processing-based explanations, as well as with suggestions for action.

Self-service data modeling for big data sets enables end users to incorporate a variety of sources into their analytics options. Cognitive computing solutions specialize in this aspect of analytics and enable the incorporation of on-the-fly, time-sensitive data (weather, news, etc.) with conventional enterprise sources for expedient analytic insight—without end users hiring data scientists.

Data Preparation

shutterstock_data_cleaning_TunedIn by Westend61

(image courtesy of TunedIn by Westend61/Shutterstock.com)

The self-service movement encompasses data preparation in two fundamental ways, both of which are based on semantics. The first of these involves data preparation tools and platforms that are designed to handle the wrangling process which often includes cleansing, integration, and transformation for analytics or application use.

These solutions provide overviews of enterprise data and identify the relevant attributes that make integration between sources advisable for specific use cases; some catalogue metadata for such purposes and combine with intuitive visualization capabilities to provide such information at a glance. According to Tamr Global Head of Strategy, Operations and Marketing Nidhi Aggarwal, the effect is that “You can actually switch from having the IT and the coding people be the only people that can interact with data, to the business people.” Machine learning algorithms can expedite integration considerations and action (based on sources or data types) to suit use cases. According to Forrester, vendors are equipping more BI and analytics tools with self-service data integration capabilities for ETL.

The self-service movement is also enabling end users to bypass conventional preparation and integration concerns with semantically enhanced data lakes. The incorporation of graph-based models and ontologies provides visualizations of data and descriptions of their properties, respectively, enabling users to select which data to integrate for specific purposes. Both of these methods gives end users more control of their data and the preparation process without waiting for data scientists. According to Cambridge Semantics president Alok Prasad:

“If you have to get in line to get your data prepared and ready before you can use it, you’ll only go to the data scientist for bigger problems and not the smaller ones. What you need is the ability to self-serve where users themselves can understand the data, discover the data, and use different elements of data to analyze, filter, and push it to other systems.”

Governance

The semantic approach of preparation tools and data lakes is also critical for reinforcing data governance protocols, without which the self-service movement would produce more harm than good.

Smart data technologies can help to facilitate role-based access to data, regardless of where they’re stored, and provide crucial information about data lineage and traceability. Organizations can catalog metadata with certain preparation solutions, enabling them to tag data according to use or user as mandated by governance policies.

The confluence of semantic and metadata consistency can create the foundation for technological conformity to governance principles in a self-service environment.

Data Science Automation

The self-service movement has automated several aspects of data science—specifically the preparation and analytics processes that give big data meaning.

Subsequently, it is not only possible to utilize big data without these professionals, but to do so inexpensively and in accordance to established procedures for data governance.

However, the newfound control over data that this movement gives end users does not marginalize data scientists. They still add value by discerning solutions and tailoring applications to address business problems with their unique combination of skills. Nonetheless, organizations can—and are—deploying big data in the midst of the data scientist shortage.

About the author: Jelani Harper has written extensively about data management for the past several years. He specializes in semantic technology, big data, and their many different applications.

Machine Learning Tool Seeks to Automate Data Science

9 Must-Have Skills to Land Top Big Data Jobs in 2015

(feature art courtesy Mopic/Shutterstock.com)

Applications: Enterprise Analytics

Technologies: Middleware

Tags: Data Scientists

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Exposing the Data Scientist Myth: Using Big Data Without Them

Analytic Insight

Data Modeling

Data Preparation

Governance

Data Science Automation

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Exposing the Data Scientist Myth: Using Big Data Without Them

Analytic Insight

Data Modeling

Data Preparation

Governance

Data Science Automation

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link