Follow Datanami:
November 17, 2023

Cloudera and NVIDIA Partner To Expand AI Capabilities


Hybrid and multi-cloud environments have revolutionized how businesses store, process, and manage data. With the rise of new technologies, such as artificial intelligence and machine learning, data management is set to get a significant boost. 

Cloudera, an enterprise data management and analytics platform, announced further support for NVIDIA’s advanced technology in private and public clouds. The collaboration will empower customers to construct and deploy AI applications with increased efficiency. Cloudera and NVIDIA had previously collaborated to accelerate data analytics and AI in the cloud. 

“GPU acceleration applies to all phases of the AI application lifecycle – from data pipelines for ingestion and curation, data preparation, model development, and tuning, to inference and model serving,” said Priyank Patel, Vice President of Product Management at Cloudera. “NVIDIA’s leadership in AI computing perfectly complements Cloudera’s leadership in data management, providing customers with a comprehensive solution to harness the power of GPUs across the entire AI lifecycle.”

Founded in 2008, Cloudera is the only cloud-native platform purpose-built to run all major public cloud providers, including Azure, AWS, and GCP. The company is one of the leaders in the cloud database management system sector and offers solutions for customer analytics, IOT, security, risk, and compliance. There has recently been an increased focus by Cloudera on harnessing the power of AI. Earlier this month, Cloudera announced a partnership with vector database leader Pinecone with the goal of accelerating GenAI work. 

One of the core benefits of Cloudera’s latest collaboration with NVIDIA to enhance AI capabilities is that users can better utilize Large Language Models (LLMS) through the Cloudera Machine Learning (CML) platform, which now supports the cutting-edge NVIDIA H100 GPU. 

Organizations can now use their own proprietary data assets to create secure and contextually-accurate responses. In addition, they can fine-tune models on large datasets and hold larger models in production. This means customers can harness the power of NVIDIA GPUs without compromising on data security. 


Another key benefit is the enhanced ability to accelerate data pipelines with GPUs in Cloudera private cloud. Cloudera Data Engineering (CDE) is a data service designed to enable users to build production-ready data pipelines from various sources. With NVIDIA Spark RAPIDS integration in CDE, extracting, transforming, and loading (ETL) workloads can now be accelerated without the need to refactor.

According to internal benchmarking testing, GPU acceleration can speed ETL applications by a factor of 7x overall, and up to 16x on select queries compared to the standard CPUs. This is a massive boost for customers looking to increase the utilization of GPUs, take advantage of GPUs in upstream data processing pipelines, and demonstrate a high return on investment.  

According to Joe Ansaldi, IRS/Research Applied Analytics & Statistics Division (RAAS)/Technical Branch Chief,  “The Cloudera and NVIDIA integration will empower us to use data-driven insights to power mission-critical use cases such as fraud detection. We are currently implementing this integration and are already seeing over 10 times speed improvements for our data engineering and data science workflows.”

Related Items 

Cloudera Recognized as a Leader in 2022 Gartner Magic Quadrant for Cloud Database Management Systems

NVIDIA Fast-Tracks Custom Generative AI Model Development for Enterprises

Cloudera Signs Strategic Collaboration Agreement with AWS