November 12, 2013

Syncsort Siphons Up Legacy Workloads for Amazon EMR

Alex Woodie

Syncsort is bringing its flavor of super-charged MapReduce job generation capabilities to Amazon’s Elastic MapReduce cloud, the companies announced today. The IronCluster ETL as-a-service offering will allow Amazon EMR customers to generate faster MapReduce jobs from a GUI, which the companies say will make it easier to migrate expensive data warehouse workloads from Teradata or the IBM mainframe into Amazon’s incredibly inexpensive cloud.

IronCluster is a new, cloud-based version of Syncsort’s existing ETL tool, called DMX-h. What makes DMX-h unique is the way it allows developers to use a GUI to create very high performing MapReduce jobs, as well as its capability to access exotic mainframe assets, such as Cobol Copybooks and EBCDIC data. Syncsort actually worked with Hadoop distributors (namely Cloudera) to get modifications committed into the Apache Hadoop project that allow DMX-h to get down and dirty with MapReduce and Hadoop.

“Our ETL product is built on the chassis of our sort engine, so it essentially becomes a bit of a Trojan Horse. It allows us to have our ETL product run in a deeply instrumented native way on every node in the cluster,” says Syncsort CEO Lonne Jaffe.

Hadoop has a “mediocre built in sort,” function in MapReduce, he says. But the sort engine in Syncsort’s ETL product is a “high performance engine that includes dozens of algorithms that optimize each workload on each machine down to memory and I/O and CPU levels. So when you craft a MapReduce job using our ETL [interface]… the MapReduce paradigm basically punches out to our tool on each of the nodes, does the fast joins and merges and sorts and aggregations, and you get this level of performance that exceeds even the best well-designed Pig code.”

Syncsort’s customers pay hundreds of thousands of dollars in software license fees for the privilege of using DMX-h to move data from source systems (relational databases, applications, data marts, etc.) and then do the first level of sorting natively in big production Hadoop clusters, Jaffe says. The fact that now customers can get access to IronCluster (DMX-h’s cloud cousin) for a fraction of that on the Amazon EMR cloud is what makes the announcement potentially successful.

“The idea is, with a single click, you’ll spin up a whole series of Amazon MapReduce nodes, with IronCluster running on it, which can do all the great things it can do in the on-prem version,” Jaffe tells Datanami. “It will allow people to do all the Teradata offloading or mainframe data access or Hadoop cluster build-out without having to build anything. It’s just one click, spin up the whole cluster, and siphon off the really expensive workload from your legacy systems into the cloud.”

This allows users to run Hadoop workloads on the platform that makes the most economic sense. “You’ll be able to get started in the cloud in a development environment, then move back on premise, or move workload back and forth, or start on premise and then move to the cloud when you need more capacity during certain part of the day,” Jaffe says. “If you only need to do the expensive processing for a couple of hours in the evening, you don’t need to buy all these servers, and then buy all these perpetual software licenses or term-based licenses. You just scale it up, and pay per the hour to Amazon.”

What’s more, Syncsort is giving away the software for small EMR environments. “It’ll be free up to 10 nodes, and from there, we’ll charge a very low, essentially hourly charge for usage of nodes,” Jaffe says. Considering the ease at which EMR users can add Hadoop nodes, you could call this Syncsort’s Trojan Horse marketing program.

Whatever it is, Syncsort is aiming squarely at the heart of Teradata’s customer base and, to a lesser extent, IBM’s System z mainframe franchise. While Teradata denies that Hadoop is having much of an impact on its business, there’s no avoiding the big yellow elephant standing in the corner of the room.

“There was a surprisingly large percentage of companies exhibiting [at Strata + Hadoop World] that had a large part of their business model offloading legacy spend from Teradata to Hadoop,” Jaffe says. The fact that storage per terabyte on Hadoop is several orders of magnitude less expensive than it is for Teradata is a big part of that interest, he says.

Today, Teradata customers may be experimenting with Hadoop, and using their small Hadoop clusters to perform some pre-processing of data before loading it into Teradata. But eventually, as the software around Hadoop matures, Jaffe predicts they will start shutting down their Teradata warehouses.

The momentum behind Hadoop is already big, and it’s just getting bigger. “The world has seen that what you can do with the data once it’s already in Hadoop is getting better and better every day,” he says. “It’s still a little bit immature. But it’s improving rapidly and there’s a lot of money moving into that space. So the next-gen BI tools and the data application companies are all furiously working either making their system run directly against Hadoop or against smoothing that’s relatively close by, like one the NoSQL repositories, like HP Vertica.”

Top Three Big Data Startups at Strata

Syncsort Bolsters Mainframe-to-Hadoop Play with Circle Buy

Applications: Enterprise Analytics

Technologies: Cloud, Middleware

Sectors: Financial Services, Healthcare, Retail

Vendors: Amazon

Tags: Hadoop, Hive, pig, Teradata

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Syncsort Siphons Up Legacy Workloads for Amazon EMR

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Syncsort Siphons Up Legacy Workloads for Amazon EMR

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link