September 1, 2015

Apache Spark Gets IBM Mainframe Connection

George Leopold

IBM’s recent embrace of Apache Spark is beginning to generate dividends in the form of open source contributions for a mainframe big data link to Spark.

Big data software vendor Syncsort, Woodcliff Lake, N.J., said Tuesday (Sept. 1) it is contributing an IBM z System mainframe connector for Apache Spark that would allow easier access to mainframe data using Spark’s analytics and Spark SQL.

The company described its latest mainframe connector as being similar to the Apache Sqoop link it released as open source software last year. That connector allows Hadoop users to import and analyze data coming from the z System mainframe environment.

The new Spark connector is designed to ease specifying the location of multiple datasets and associated metadata. It also automatically transfers datasets via a secure connection into Spark’s DataFrame objects.

Syncsort said users could then combine the DataFrame object with their other data sources for further analysis. The mainframe connector also conforms to Spark’s data sources API specification.

Given Spark’s in-memory capabilities, the connector allows queries to access mainframe data without first having to offload data. That means mainframe record formats including fixed, variable, sequential and VSAM files are supported. Syncsort said the connector also handles compressed data transfer, which is designed to reduce network bandwidth requirements.

“We believe that Apache Spark will play a critical role in a wide variety of next-generation use cases, including streaming ETL and the Internet of Things,” Tendü Yoğurtçu, general manager of Syncsort’s big data business, noted in a statement. Yoğurtçu added that the company plans additional contributions to Spark and related big data projects “to enable a uniform user experience for batch and real-time workloads across all data sources.”

Along with platforms like Spark and Hadoop, the company also focuses on cloud platforms and Splunk software used to search and analyze machine-generated data.

The new z Systems mainframe connector to Apache Spark follows IBM’s announcement in June that it would work with Databricks, the company formed by the creators of the analytics engine, to integrate Spark software into the “core” of its analytics and commerce platforms. It will also offer Spark as a service on its Bluemix cloud application development platform.

The commitment to Apache Spark also gives IBM another vehicle besides its Watson cognitive computing platform for advancing its machine learning technology.

IBM also said it would open a Spark Technology Center while committing more than 3,500 developers and researchers to focus on Spark-related projects.

Backing for Apache Spark also includes the donation of IBM’s SystemML machine learning technology to the Spark open source project. IBM also said it would leverage current partnerships to train as many as 1 million data scientist and engineers on Apache Spark.

Along with z Systems, IBM also said it plans to host Spark on its Power-based systems.

Syncsort’s z Systems mainframe connector to Spark is available here.

Recent items:

IBM, Databricks Join Forces to Advance Spark

Hortonworks Hatches a Roadmap to Improve Apache Spark

Applications: Enterprise Analytics

Technologies: Frameworks, Systems

Sectors: Financial Services, Manufacturing, Retail

Tags: apache spark, connector, databricks, Hadoop, IBM, mainframe, Spark, syncsort, z Systems

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Apache Spark Gets IBM Mainframe Connection

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 29, 2024

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Apache Spark Gets IBM Mainframe Connection

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 29, 2024

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link