May 28, 2014

Why Hadoop Won’t Replace Your Data Warehouse

Alex Woodie

A lot has been made of Hadoop becoming the singular control point for analytics, effectively usurping the enterprise data warehouse (EDW). The recent quest to integrate SQL into Hadoop is an example of that. But a better role for Hadoop is emerging that has it working hand in hand with existing EDW implementations in support of a hybrid big data analytics architecture.

The hype level around Hadoop continues to run high as the big data wave keeps getting bigger. The latest phase of excitability is coming from the Internet of Things (IoT) and the possibilities surrounding the analysis of machine data. Hadoop, with its affordable and flexible file system, is seen as a likely candidate to store and process petabytes worth of semi-structured data.

As the IoT grows, the Hadoop community continues its previous assigned task, which is retrofitting Apache Hadoop with SQL interfaces to make it easier to use (not to mention more EDW-like). Cloudera and Hortonworks, with their Impala and Stinger initiatives, are leading the push to speed SQL access on Hadoop and eliminate the Hadoop version 1/MapReduce paradigm.

But as the SQL work continues, it raises questions. Are we re-inventing the wheel? Are we duplicating what we already have with EDWs? Is this really where Hadoop should be headed? Is this the best use of our resources?

There are a couple of angles to the SQL-on-Hadoop story. On the one hand, SQL makes Hadoop more accessible to existing business intelligence tools, as well as the millions of data analysts who can write SQL. But there is also a movement under foot to replace EDWs with Hadoop, and SQL support is part and parcel of that drive.

On a purely dollar-per-TB metric, Hadoop beats EDW hands down every time. Hadoop’s capability to deliver massive parallelization by harnessing commodity X86 processors, SATA disks, and plain vanilla Ethernet networks is clearly a force to be reckoned with. But EDWs provide far more than just storage and a SQL interface, and proposing Hadoop as a replacement for a mature EDW implementation is a proposition that’s fraught with potentially unseen complications.

Steve Wooledge, vice president of product marketing at Hadoop distributor MapR Technologies, says Hadoop has a ways to go before it can replicate the functionality delivered by mature EDWs.

“For a sophisticated data warehouse user, there are certain types of workloads, very complex SQL, that mature database technology’s [have an edge],” he says. “Hadoop’s just not there yet.”

Customers are exploring the possibility of replacing Teradata or Oracle data warehouses with Hadoop, Wooledge says. “That’s part of their data science experiments. They want to see what Hadoop’s good for. At this point in time, it’s not the right place for a data warehouse,” he says.

“Vendors that talk about replacing the data warehouse are misleading and they’re losing credibility.”

The data analytics giant SAS sees enough data going into Hadoop to make it worthwhile to offer two products on HDFS, including its SAS In-Memory Statistics for Hadoop and Visual Statistics, which it unveiled earlier this month. But that doesn’t mean customers are ditching their Teradata, Oracle, or Greenplum EDWs in favor of Hadoop, says SAS chief data scientist Wayne Thompson.

In particular, EDWs still hold an edge over Hadoop when it comes to serving data analysts with updating records, Thompson says. “The reason that we have Visual Statistics on other platforms is Hadoop is not so good for updates,” he tells Datanami. “A lot of customers are still going to have their master EDW in a Teradata or Oracle system. We still see a proliferation of these advance business analysts…who need statistics in these EDWs, and will need them for a long time to come, at least the next five years.”

A new data analytics architecture is emerging that blends next-gen platforms, such as Hadoop, in-memory data grids, and graph databases, with traditional relational databases and data warehouses. Under this hybrid architecture, each component does what it’s best at, enabling customers to get the benefits of new analytic technologies without suffering from the drawbacks.

At cloud analytics software firm Treasure Data, a trend is emerging that sees users augmenting their existing EDWs with its hosted offering, which blends MapReduce and a fast column-oriented data store called Plasma. The company’s 110 customers currently have more than 4 trillion rows of data occupying about 4 petabytes of storage in Treasure Data’s cloud.

“What we see is more folks talking about us as an adjunct cloud facility for big data alongside their classic data warehouses from Oracle, Teradata, and others,” says Rich Ghiossi, Treasure Data’s vice president of marketing. “We’re not confused about the fact that people may already had a data warehouse installed. They look at it and say, for us to put [a big data solution] into that environment is just prohibitive from cost and manageability standpoint.”

As Hadoop implementations go from proof-of-concept into full production, there will be a desire to expand the Hadoop footprint and do more with it. That is a natural reaction, especially if the organization is getting actionable insights from their Hadoop cluster that would be difficult to get elsewhere.

But the enthusiasm for Hadoop needs to be tapered with the reality of the situation, which is that Hadoop is still a fairly new technology that doesn’t offer all of the enterprise-grade features that EDWs have offered for years. MapR’s Wooledge, who used to work at Teradata, doesn’t see Hadoop offering the same level of user concurrency, dynamic workload management, and data latency capabilities that Teradata offers anytime soon. “Some of the things that Teradata has created are absolutely best in class,” he says.

One workload that Hadoop has excelled at is running ETL jobs. Ten years ago, ETL was a single-threaded process that fed data into the data warehouse from a separate app server. But now those workloads are getting the benefit of massive parallelization thanks to Hadoop. “Now that Hadoop’s here, it makes natural sense to land your data into a file, do your transformation there, and then move the data that’s analyzable into the data warehouse,” he says.

Over time as SQL on Hadoop matures, there may be other types of workloads that can move. But for now, organizations are best served by thinking about Hadoop not as a replacement for EDW, but as another cog in the data analytics machine that must play well with others.

Teradata Makes Data Warehouse More Hadoop-ish

The New Data Blending Mandate

Applications: Enterprise Analytics

Technologies: Cloud, Middleware

Sectors: Retail

Vendors: Cloudera, Hortonworks, MapR Technologies, SAS, Treasure Data

Tags: data warehouse, Hadoop, hybrid data analytics

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Why Hadoop Won’t Replace Your Data Warehouse

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Why Hadoop Won’t Replace Your Data Warehouse

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link