July 20, 2015

Cloudera Expands Spark Support

George Leopold

Data management specialist Cloudera is targeting “data at scale” with the rollout of an open source project dubbed Ibis designed to make Hadoop more accessible to data scientists.

Along with its Ibis initiative that leverages the Python language, Cloudera said Monday (July 20) its big data push includes support for Apache Spark MLlib, the machine-learning library, in its upcoming release of Cloudera Enterprise 5.5. A Hadoop applications conference is also planned for October.

Cloudera, Palo Alto, Calif., said Ibis would allow data scientists to fully utilize the Python stack as Hadoop is used for more complex workloads. The project reflects the importance of Python language in data science as well as the scaling of Hadoop from a batch-processing tool to a cornerstone of a big data ecosystem.

“We want to build on this momentum and make Hadoop’s infrastructure more accessible,” Wes McKinney, a Cloudera software engineer noted in a statement. “We’re doing that by bringing Python more fully into the ecosystem, expanding our support for machine learning on Spark and focusing on the real-world, practical applications of data science.”

Python development has been limited to local data processing and smaller data sets, limiting its utility for crunching big data. It is now being used for automating ETL and other tasks. Cloudera Labs’ Ibis data analysis framework is intended to allow Python users to process data at scale without sacrificing performance.

The initial version of Ibis includes support for Python capabilities such as built-in analytics via Impala, Hadoop’s database engine, for simplified ETL. Later versions will include additional Python packages and the ability to author Python functions, Cloudera said.

Impala also provides Python users with a native platform for Hadoop that improves performance and enables scaling needed for big data analytics.

Cloudera said Ibis is available as a preview in Cloudera Labs, its “virtual incubator” for new development projects. Ibis is an Apache-licensed project and open to contributions from the developer community, the company added.

As an early supporter of Spark, Cloudera has been integrating the data processing engine into the Hadoop ecosystem. Among its efforts is a Spark-on-YARN integration for shared resource management, integration with Apache Kafka and Apache HBase as well as adding new Spark features like data loss protection.

Cloudera said it has contributed more than 370 patches and 43,000 lines of code to Spark and is driving Spark development with partner Intel.

As part of the effort, Cloudera also is adding built-in support for Spark MLlib to its Enterprise 5.5 platform scheduled for release later this year. The integration is intended to allow data scientists to leverage scalable machine learning while harnessing Spark’s performance. The Cloudera platform already includes Spark core and Spark Streaming.

The company also announced this week it is sponsoring a conference for data scientists focusing on Hadoop applications. The “Wrangle Conference” is scheduled for Oct. 22 in San Francisco.

Recent items:

Python Versus R in Apache Spark

Python Wraps Around Big, Fast Data

Applications: Enterprise Analytics, Security

Technologies: Frameworks

Sectors: Other

Tags: cloudera, cloudera hadoop, Spark. Python

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Cloudera Expands Spark Support

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Cloudera Expands Spark Support

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link