Follow Datanami:
November 17, 2023

Cloudera Integrates NVIDIA’s H100 GPU for Advanced Machine Learning Workloads

SANTA CLARA, Calif., Nov. 17, 2023 — Cloudera has announced additional support for key NVIDIA technologies in public and private clouds to help enable customers to efficiently build and deploy best-in-class applications for artificial intelligence.

Credit: Michael-Vi/Shutterstock

“GPU acceleration applies to all phases of the AI application lifecycle – from data pipelines for ingestion and curation, data preparation, model development and tuning, to inference and model serving,” said Priyank Patel, Vice President of Product Management at Cloudera. “NVIDIA’s leadership in AI computing perfectly complements Cloudera’s leadership in data management, offering customers a complete solution to harness the power of GPUs across the entire AI lifecycle.”

This new phase in Cloudera’s technology collaboration with NVIDIA adds multigenerational GPU capabilities for data engineering, machine learning and artificial intelligence in both public and private clouds:

  1. Accelerate AI and Machine Learning Workloads in Cloudera on Public Cloud and On-Premises Using NVIDIA GPUs
    Cloudera Machine Learning (CML) is a leading service of Cloudera Data Platform that empowers enterprises to create their own AI applications, unlocking the potential of open-source Large Language Models (LLMs) by utilizing their own proprietary data assets to create secure and contextually-accurate responses.
    The CML service now supports the cutting-edge NVIDIA H100 GPU in public clouds and in data centers. This next-generation acceleration empowers Cloudera’s data platform, enabling faster insights and more efficient generative AI workloads. This results in the ability to fine-tune models on larger datasets and to host larger models in production. The enterprise-grade security and governance of CML means businesses can leverage the power of NVIDIA GPUs without compromising on data security.
  2. Accelerate Data Pipelines with GPUs in Cloudera Private Cloud
    Cloudera Data Engineering (CDE) is a data service that enables users to build reliable and production-ready data pipelines from sensors, social media, marketing, payment, HR, ERP, CRM or other systems on the open data lakehouse with built-in security and governance, orchestrated with Apache Airflow, an open source project for building pipelines in machine learning.
    With NVIDIA Spark RAPIDS integration in CDE, extracting, transforming, and loading (ETL) workloads can now be accelerated without the need to refactor. Existing Spark ETL applications can seamlessly be GPU-accelerated by a factor of 7x overall and up to 16x on select queries compared to standard CPUs (based on internal benchmarks). This allows customers of NVIDIA to take advantage of GPUs in upstream data processing pipelines, increasing utilization of these GPUs and demonstrating higher return on investment.

“Organizations are looking to deploy a number of AI applications across a wide range of data sets,” said Jack Gold, President of J.Gold Associates. “By offering their customers the ability to accelerate both machine learning and inference by leveraging the power of latest generation of NVIDIA accelerators in cloud and/or hybrid cloud instances, Cloudera is enabling users of both their data lakehouse and data engineering tools to optimize time to market and train models specific to their own data resources. This kind of capability is a key differentiator for enterprises looking at making LLMs a mission critical part of their solution set.”

“We need to be able to make accurate decisions at speed utilizing vast swathes of data. That challenge is ever-evolving as data volumes and velocities continue to increase,” said Joe Ansaldi, IRS/Research Applied Analytics & Statistics Division (RAAS)/Technical Branch Chief. “The Cloudera and NVIDIA integration will empower us to use data-driven insights to power mission-critical use cases such as fraud detection. We are currently implementing this integration and are already seeing over 10 times speed improvements for our data engineering and data science workflows.”

Learn more about Cloudera Machine Learning now supporting the NVIDIA H100 GPU and NVIDIA Spark RAPIDS integration in CDE here.

About Cloudera

Cloudera believes data can make what is impossible today, possible tomorrow. We empower people to transform their data into trusted enterprise AI so they can reduce costs and risks, increase productivity, and accelerate business performance. Our open data lakehouse enables secure data management and portable cloud-native data analytics helping organizations manage and analyze data of all types, on any cloud, public or private. With as much data under management as the hyperscalers, we’re a data partner for the top companies in almost every industry. Cloudera has guided the world on the value and future of data, and continues to lead a vibrant ecosystem powered by the relentless innovation of the open source community.

Source: Cloudera