March 3, 2016

See EBCDIC Run on Hadoop and Spark

Alex Woodie

Only 20,000 or so of the big beasts still exist in the wild. They’re IBM mainframes, and despite the scorn of a legacy label, they continue to run critical processes companies simply don’t trust to commodity Intel boxes. Today Syncsort announced it’s providing a way for mainframe owners to process data in Spark and Hadoop while keeping the data in its original mainframe data format, EBCDIC.

IBM (NYSE: IBM) is one of the only server makers that uses the Extended Binary Coded Decimal Interchange Code (EBCDIC) character encoding to store data, as opposed to the more popular American Standard Code for Information Interchange (ASCII) data format that’s used by every other operating system on the planet. IBM uses EBCDIC with two systems: its System z mainframe as well as its IBM i midrange line of servers, the “baby mainframe” used by more than 100,000 organizations globally. Fujitsu also uses EBCDIC in its mainframe.

But because no server is an island today, IBM and Fujitsu customers are constantly translating their EBCDIC data into ASCII to integrate and process it with data on other systems, for transactional and data warehousing workloads. That extra step is a hassle, and creates opportunity for bad things to happen.

But now Syncsort has found a way to enable mainframe customers to process their EBCDIC data on X86-based Hadoop and Spark clusters without first translating that data into the ASCII character set. This is important, Syncsort says, because it allows organization to maintain a natural and untouched lineage of mainframe data for compliance purposes.

How does it work? The Woodcliff Lake, New Jersey, company says its DMX-h data integration software essentially “teaches” Hadoop how to talk EBCDIC.

“DMX-h comes with its own Hadoop InputFormat and OutputFormat implementations to deal with mainframe data in Hadoop MapReduce, so we ‘teach’ Hadoop how to speak EBCDIC,” a company spokesperson tells Datanami. “DMX-h engine running natively in the cluster can process EBCDIC data. The same InputFormat and OutputFormat implementations are used in Apache Spark.”

Syncsort says this new capability will benefit companies in regulated industries, such as banking, insurance, and healthcare, that have struggled to analyze their mainframe data using Hadoop and Spark because of the need to preserve data in its original EBCDIC format.

Previously, Syncsort addressed the EBCDIC data issue by converting the data into ASCII as part of its ETL offload offering using DMX-h. The software can still convert the mainframe data into ASCII if required, but the new capability, ostensibly, should eliminate the need for that extra step.

The new EBCDIC capability solves a technical issue that lets Syncsort’s customers do things that “were previously impossible,” says Tendü Yoğurtçu, the general manager of Syncsort’s big data business. “Not only do we simplify and secure the process of accessing and integrating mainframe data with big data platforms, but we also help organizations who need to maintain data lineage when loading mainframe data into Hadoop,” she says in a statement.

There are other ways to analyze mainframe data in Spark, Hadoop, or both. IBM itself provides a version of its Hadoop distribution called IBM InfoSphere BigInsights for Linux on System z that’s designed to run on the mainframe’s Linux subsystem. But this product works by translating the EBCDIC data into ASCII, according to an IBM senior product marketing manager.

Syncsort—which does a fair amount of business enabling mainframe shops to offload their big ETL workloads from mainframes to Hadoop clusters–also introduced a new DMX Data Funnel capability that allows large collections of database tables to be imported into Hadoop en masse. Companies that regularly need to move large amounts of data into Hadoop will benefit from Data Funnel by being able to move hundreds of tables into HDFS with a single click of a button, the company says.

How a Web Analytics Firm Turbo-Charged Its Hadoop ETL

Syncsort Siphons Up Legacy Workloads for Amazon EMR

Applications: Enterprise Analytics, Predictive Analytics

Technologies: Frameworks, Middleware

Sectors: Financial Services, Government, Healthcare, Manufacturing, Retail

Vendors: IBM, SyncSort

Tags: ASCII, EBCDIC, ETL, Hadoop, mainframe, Spark

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

See EBCDIC Run on Hadoop and Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

See EBCDIC Run on Hadoop and Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link