Follow Datanami:
October 9, 2014

Syncsort’s Contribution to Sqoop Moves Big Data From IBM Mainframe to Hadoop

WOODCLIFF LAKE, N.J.Oct. 9 — Syncsort, a global leader in Big Data software, today announced another milestone contribution to the Apache Hadoop ecosystem, incorporating powerful technology into the Apache Sqoop open source project that will allow Hadoop users to easily import and transform data coming from the IBM System z mainframe environment.

“Many organizations are looking to increase efficiency and save money by moving targeted mainframe data and workload processing to Hadoop,” said Charles Zedlewski, vice president, products, Cloudera. “Taken together, Apache Sqoop and Syncsort’s open source contributions will facilitate the importation and transformation of all types of mainframe data, allowing customers to take full advantage of Hadoop’s advanced analytical capabilities.”

As Hadoop has emerged as the dominant data processing platform for the enterprise, there is a growing need to rapidly move and transform mainframe data into an understandable next generation Big Data format. Syncsort’s contributions to Apache Sqoop will make it much more cost effective to store mainframe historical data in HDFS and will also help free-up mainframe CPU cycles by allowing customers to move expensive data processing workloads from the mainframe to Hadoop.

The new technology is now committed as SQOOP-1272, and supports loading multiple mainframe data sets to each of the nodes in a Hadoop cluster in parallel and transforming them into any Apache Sqoop supported file format. This makes it simple for organizations to integrate data from mainframe databases, such as DB2/z, IMS, Adabas, IDMS, and Datacom, with the rest of the data in a typical next-generation Big Data environment.

The contribution also features an open application programming interface (API) to allow anyone to extend support for more complex mainframe data files. Syncsort’s own award-winning DMX-h technology uses this open API, serving as a feature-rich add-on that can handle binary sequential data with COBOL copybook metadata and VSAM datasets. Syncsort’s DMX-h plug-in also allows seamless archiving of mainframe data to Hadoop, preserving its original mainframe record format.

“We will continue to be one of the most prolific contributors to the Apache Hadoop family of projects, adding open source technology that helps simplify and accelerate the process of offloading of legacy workloads and data into Hadoop,” said Tendu Yogurtcu, vice president, engineering, Syncsort. “This new open source contribution extends Apache Sqoop with the ability to move Partitioned Data Sets, such as IBM DB2 dump files, from z/OS on the mainframe to Hadoop and to store the data in any Apache-Sqoop supported format.”

Datanami