June 13, 2013

Syncsort Aims to Bridge Hadoop ETL

Isaac Lopez

When Cloudera took to the virtual airwaves last week with a press event proclaiming Hadoop as the center of gravity in the data warehouse, not everyone agreed. While some would say that Cloudera was overstating Hadoop’s current position, there seems to be little controversy around the idea of Hadoop’s increasing role as a data warehouse offload tool.

We spoke with Josh Rogers, Senior Vice President at Syncsort about this particular use case. He tells us that while it’s not necessarily in-line with the sexy big data promises of predictive and sentiment analysis, data warehouse offloading is becoming an increasingly important on-ramp for Hadoop usage.

“If you look today, most large enterprises have somewhere between 35 and 65 percent of their warehouse capacity dedicated to ELT,” Rogers told us commenting on his encounters with organizations wrestling with these issues. This is a challenge for organizations, says Rogers, especially as the data and workloads increase, bringing with them increasing costs and failures at the SLA level.

This, says Rogers, is a driving force in why Hadoop as a data warehouse offload is gaining popularity. “If I can free up 30% of my data warehouse and put off additional upgrades to an incredibly expensive but powerful data store, that create real saving in my organization,” he explains, adding that aside from the immediate measurable ROI benefits, this use case provides another type of less measurable but still valuable ROI: organizational experience in implementing Hadoop – a problem which he says is still preventing organizations from getting the most out of their Hadoop installations.

“Organizations need to have extremely talented java developers to be able to be productive and create data flows or business logic that is going to execute in their clusters,” he explains. Rogers says that while a shortage exists for this kind of talent to maximize Hadoop’s potential, organizations can use ELT process optimization and data warehouse offloading as an opportunity to level-up their organizational skill sets. “If you can take a tooling environment that allows people to use their existing set of skills to contribute in this new architecture, that’s very powerful.”

However, say Rogers, there are still gaps that exist in making Hadoop a complete ETL solution, including what he refers to as a connectivity gap. Rogers explains that while there are a lot of different ways and mechanisms that can be used to move data into Hadoop, they’re not particularly consistent or coordinated. “It’s a bunch of one-off connections that I have to manually create and feed, and we think that limits people’s ability to move all the data they want on a repetitive basis, consistently and reliably on the platform.”

Last month, Syncsort released new data integration tools geared for Hadoop that attempt to address this issue. Part of their Spring ’13 release, Syncsort announced two new products to extend their DMX data integration offering. Dubbed DMX-h, Rogers says their Hadoop-centered tools aim to close key gaps that exist in the data warehouse offload and ETL use case for organizations using Hadoop.

The new tools, say Rogers, provide users with an ETL application that runs natively on Hadoop, and provide users with a drag and drop interface that is familiar to an ETL developer. Explaining DMX-h ETL edition, Rogers says that they’ve made it a native Hadoop application that interacts with the MapReduce compute framework through a contribution that the company made to open source Apache Hadoop this past January. Through this approach, Rogers says that organizations can gain full connectivity to all their data sources, including mainframe data.

Rogers says that leveraging their contribution, their tools running natively within MapReduce on every node in the cluster, they’re able to achieve performance benefits over custom coded solutions. “We’re the only data integration vendor in the marketplace that can claim that,” claims Rogers. “Everyone else is taking essentially a code generated approach – which is, you can use my UI on the back-end, I’ll generate some MapReduce code, or Pig, or HiveQL. That generally is going to be slower and less efficient from an execution perspective than if you custom coded it in MapReduce.”

While Hadoop’s use is by no means limited to data warehouse offloading, Rogers says that it’s currently the most common use case that he’s seeing in the market today – one that he expects to grow as people face scaling problems with their traditional systems.

“You will not be able to develop the appropriate level of competency in terms of big data analytics on existing relational architectures – the traditional [systems] are just not going to get you there,” he says adding that he sees ETL applications will prove to be to Hadoop as Excel was to Windows 95.

“We believe that while it’s lower level, ETL applications are going to be the killer application that drives adoption of Hadoop because it’s a very logical place to start.”

The Transformational Role of the CIO in the New Era of Analytics

Applications: Complex Event Processing

Technologies: Frameworks, Storage

Sectors: Other

Tags: database offload, ELT, ETL, Hadoop, Josh Rogers, syncsort

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Syncsort Aims to Bridge Hadoop ETL

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Syncsort Aims to Bridge Hadoop ETL

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link