March 15, 2021

Informatica Accelerates DataOps with Spark, GPUs

Alex Woodie

Informatica today announced that customers can see up to a 5x performance boost for ETL and data management workloads when they run them under its new cloud-based data integration engine that’s powered by Apache Spark and Nvidia GPUs.

Informatica’s new offering, called Cloud Data Integration, is a hosted service designed to enable users to execute a slew of data operations (DataOps) and management tasks, including collecting data, building ETL pipelines, cleaning data, and preparing it for downstream analysis and machine learning tasks.

The offering, which runs as a serverless cloud service, utilizes Apache Spark as the underling computational engine. Informatica also uses Nvidia’s RAPIDS Accelerator, which enables the Spark code to run atop Nvidia GPUs.

The combination of the Spark code and the GPUs resulted in a significant speedup as well as cost savings. According to Informatica, Cloud Data Integration runs 5 times faster than similar offerings, with 72% lower total cost of ownership (TCO).

No sophisticated Spark skills are needed to use the new service, Informatica says. Users can work in a “simple drag-and-drop GUI-based development experience” that converts “simple mappings to sophisticated Spark code that can execute on GPUs at scale,” the Silicon Valley firm says in a press release.

It’s all about data democratization, which is “the holy grail of digital transformation initiatives,” according to Jitesh Ghai, Informatica’s chief product officer. “You can’t leverage the power of data and gain valuable insights if you are restricted in your data access,” Ghai says in a press release. “Our collaboration with NVIDIA is valuable to us in bringing enterprise-scale data democratization and narrowing the gap between the data-haves and the data-have-nots within the enterprise.”

Informatica says its Cloud Data Integration offering supports more than 3,000 metadata-aware connectors for an array of file types, including JSON, XML, logs, and clickstream data. The offering supports ETL and ELT workloads, and features more than 100 prebuilt function templates for common data mappings and transformations.

Cloud Data Integration run in elastic Kubernetes clusters on AWS, Azure, and Google Cloud. It also supports real-time change data capture (CDC) functions, enabling it to extract data from production databases running on Windows, Linux, Unix, and IBM i systems. It also supports pushdown optimization that converts workloads to optimized SQL code for popular cloud data warehouses. For more info, see www.informatica.com.

Running Sideline to Sideline with Big Data

Can We Stop Doing ETL Yet?

Applications: Enterprise Analytics

Technologies: Cloud, Frameworks

Tags: CDC, ELT, ETL, GPU, serverless, Spark

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Informatica Accelerates DataOps with Spark, GPUs

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Informatica Accelerates DataOps with Spark, GPUs

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link