December 7, 2017

Data Prep Goes Serverless

George Leopold

via Shutterstock

The rise of platforms in which cloud providers manage the allocation of computing and storages resources has opened the door to new data services such as serverless data preparation tools. The list of self-service data preparations tools is growing as vendors offer varying approaches to whipping raw data into shape for analysis.

“These tools are aimed at reducing the time and complexity of preparing data and improving analyst productivity,” Gartner noted in a recent review of self-service data preparation tools. Vendors estimate that data scientists spend about 80 percent of their time preparing data for analysis.

Cloud-based serverless data prep tools appear to be making the most headway among data analysts seeking new ETL tools looking to wrangle their own data sets for analysis as an alternative to standard ETL routines developed to plumb data warehouses.

Among the tools gaining the highest marks in the recent Gartner survey of self-service data prep vendors were Lavastorm and Trifacta. Google recently announced the beta availability of a managed data wrangling service developed in collaboration with Trifacta called Google Cloud Dataprep.

The service is designed to accelerate data preparation for analysis using Google Cloud Platform, the partners said. The data prep tool also leverages serverless data processing engine, Google Cloud Dataflow, which manages computing resources as needed.

Google extended the Trifacta data prep service by adding support for BigQuery and cloud storage.

In one use case example, raw event data from Internet of Things and other devices was dropped into BigQuery where data descriptors were added and then combines with other data feeds to ease queries using tools such as Looker, the analytical tool vendor specializing in the Google database.

In a blog post, Mark Rittman, product manager for analytics at Qubit, said he used the configuration to set up BigQuery tables to receive data from via streaming inserts sent by a server running on a Google Compute Engine virtual machine. Using data from his Fitbit health tracker, he assembled data prepped by the Google tool using its “spreadsheet-like interface.”

What’s missing, Rittman noted, was support for cloud APIs such as support for Google (NASDAQ: GOOGL) natural language processing. He expects these and other upgrades to be added as Google extends the Trifacta code base to leverage more serverless analytics features

The embrace of serverless data prep tools underscores the steady enterprise shift of big data analytics away from on-premise Hadoop deployments to the public cloud. Gartner (NYSE: IT) estimates global public cloud services will grow 18 percent this year to $247 billion, and that cloud services will account for the majority of analytics purchases by 2020.

In a community survey released this week, the Cloud Native Computing Foundation reported that 70 percent of members are using Amazon Web Service’s (NASDAQ: AMZN) Lambda serverless platform while Google Cloud Functions, Microsoft (NASDAQ: MSFT) Azure Functions and Apache OpenWhisk are also gaining traction.

Recent items:

Cloud In, Hadoop Out as Hot Repository for Big Data

Looker Rolls New Google BigQuery Tools

Applications: Enterprise Analytics

Technologies: Cloud, Frameworks

Sectors: Manufacturing, Other, Retail

Vendors: AWS, google, Lambda, lavastorm, looker, Trifacta

Tags: data prep, data preparation, data wrangling, self-service analytics, serverless

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Data Prep Goes Serverless

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Data Prep Goes Serverless

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link