Follow Datanami:
September 1, 2015

Apache Spark Gets IBM Mainframe Connection

IBM’s recent embrace of Apache Spark is beginning to generate dividends in the form of open source contributions for a mainframe big data link to Spark.

Big data software vendor Syncsort, Woodcliff Lake, N.J., said Tuesday (Sept. 1) it is contributing an IBM z System mainframe connector for Apache Spark that would allow easier access to mainframe data using Spark’s analytics and Spark SQL.

The company described its latest mainframe connector as being similar to the Apache Sqoop link it released as open source software last year. That connector allows Hadoop users to import and analyze data coming from the z System mainframe environment.

The new Spark connector is designed to ease specifying the location of multiple datasets and associated metadata. It also automatically transfers datasets via a secure connection into Spark’s DataFrame objects.

Syncsort said users could then combine the DataFrame object with their other data sources for further analysis. The mainframe connector also conforms to Spark’s data sources API specification.

Given Spark’s in-memory capabilities, the connector allows queries to access mainframe data without first having to offload data. That means mainframe record formats including fixed, variable, sequential and VSAM files are supported. Syncsort said the connector also handles compressed data transfer, which is designed to reduce network bandwidth requirements.

“We believe that Apache Spark will play a critical role in a wide variety of next-generation use cases, including streaming ETL and the Internet of Things,” Tendü Yoğurtçu, general manager of Syncsort’s big data business, noted in a statement. Yoğurtçu added that the company plans additional contributions to Spark and related big data projects “to enable a uniform user experience for batch and real-time workloads across all data sources.”

Along with platforms like Spark and Hadoop, the company also focuses on cloud platforms and Splunk software used to search and analyze machine-generated data.

The new z Systems mainframe connector to Apache Spark follows IBM’s announcement in June that it would work with Databricks, the company formed by the creators of the analytics engine, to integrate Spark software into the “core” of its analytics and commerce platforms. It will also offer Spark as a service on its Bluemix cloud application development platform.

The commitment to Apache Spark also gives IBM another vehicle besides its Watson cognitive computing platform for advancing its machine learning technology.

IBM also said it would open a Spark Technology Center while committing more than 3,500 developers and researchers to focus on Spark-related projects.

Backing for Apache Spark also includes the donation of IBM’s SystemML machine learning technology to the Spark open source project. IBM also said it would leverage current partnerships to train as many as 1 million data scientist and engineers on Apache Spark.

Along with z Systems, IBM also said it plans to host Spark on its Power-based systems.

Syncsort’s z Systems mainframe connector to Spark is available here.

Recent items:

IBM, Databricks Join Forces to Advance Spark

Hortonworks Hatches a Roadmap to Improve Apache Spark