April 13, 2021

Cloudera, Nvidia Team to Speed Cloud AI via Spark

George Leopold

Cloud access to GPUs for AI development will expand under a partnership between Cloudera and Nvidia that calls for the data cloud provider to integrate Nvidia’s accelerated Apache Spark 3.0 platform as a way to scale data science workflows.

RAPIDS data science libraries running on Nvidia GPUs are designed to help speed emerging AI pipelines to boost the performance of data analytics and machine learning workflows, the partners said Monday (April 12). The partnership was among many announced during this week’s Nvidia GPU Technology Conference.

Cloudera (NYSE: CLDR), Palo Alto, Calif., has been working with Nvidia (NASDAQ: NVDA) since last year to deploy GPU-accelerated AI applications via the RAPIDS accelerator across hybrid and multi-cloud deployments. Spark 3.0 is the first release offering GPU acceleration for analytics and AI workloads.

The RAPIDS ecosystem includes Spark creator Databricks’ web-based platform for big data processing and Anaconda, an open source distribution of the Python and R programming languages for data science and machine learning.

The integration of RAPIDS with the cloud data platform “enables accelerated and scalable big data pre-processing, and workflows without code changes,” Scott McClellan, Nvidia’s senior director of product management, noted in a blog post detailing the AI and data analytics collaboration.

The cloud integration is aimed at enterprise data engineers and data scientists looking to overcome bottlenecks created by torrents of increasingly unstructured data. GPU-accelerated Spark processing accessible via the cloud would help break logjams that slow the training and deployment of machine leaning models, the partners said.

“Apache Spark is a cornerstone of the machine learning and data analytics pipelines enterprises rely on to remain competitive,” McClellan said.

Cloudera said the RAPIDS accelerator for Apache Spark will initially be available this summer on its private data cloud service. The partners plan to roll out other acceleration tools on the Cloudera data cloud, starting in May with accelerated deep learning and machine learning tools.

Source: Nvidia

Built on the CUDA API model, Nvidia’s RAPIDS software libraries allow data science and analytics pipelines to be executed on GPUs via familiar interfaces such a Pandas.

The partners noted that accelerating machine learning workflows has previously been problematic. Accessing GPUs via the Apache Spark accelerator running on the Cloudera’s data platform provides data scientists with native access in the cloud or on-premise to the workflow accelerator through Cloudera’s private cloud infrastructure.

Nvidia said the integration of RAPIDS with machine learning frameworks and the scheduling of GPU jobs via Spark 3.0 GPU enables the acceleration of model training and tuning. “This allows data scientists and [machine learning] engineers to have a unified, GPU-accelerated pipeline for ETL and analytics,” the GPU leader added.

The RAPIDS library coupled with the Apache Spark distributed computing framework is also promoted as accelerating Spark SQL and DataFrame processing via GPUs without code changes.

Recent items:

RAPIDS Momentum Builds with Analytics, Cloud Backing

Cloudera Delivers Private Cloud Amid Public Speculation of Sale

Nvidia Destroys TPCx-BB Benchmark with GPUs

Applications: Artificial Intelligence, Enterprise Analytics

Technologies: Cloud, Frameworks

Sectors: Financial Services, Manufacturing, Other, Retail

Vendors: Cloudera

Tags: AI workflow, apache spark, cloud GPUs, data cloud, data science, machine learning, Nvidia, RAPIDS

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Cloudera, Nvidia Team to Speed Cloud AI via Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In