Follow Datanami:
July 20, 2015

Cloudera Introduces Ibis

PALO ALTO, Calif., July 20 — As the amount of data continues to grow exponentially, data scientists increasingly need the ability to perform full-fidelity analysis of that data at massive scale. Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop, today announced a number of new initiatives to enable data scientists to take advantage of big data and Hadoop for data analysis with more complex workflows.

Beginning with the introduction of Ibis, an open source project incubating within Cloudera Labs, the company is enabling advanced data analysis on a 100% Python stack–bringing a native Python experience to Hadoop at scale. Cloudera has also announced that, as a direct contributor and industry leader in education around Hadoop, Cloudera will be hosting and organizing the first-ever Wrangle Conference, an event focused exclusively on real-world applications of data science, from the startup to the enterprise.

“Hadoop has evolved dramatically over the last decade, from a batch processing tool to an entire ecosystem that powers most of today’s information architecture as well as traditional BI workloads,” said Wes McKinney, a software engineer at Cloudera and the creator of Python pandas. “We want to build on this momentum and make Hadoop’s infrastructure more accessible to the data science community. We’re doing that by bringing Python more fully into the ecosystem and focusing on the real-world, practical applications of data science.”

Ibis

Cloudera recognized the importance of the Python language in modern data engineering and data science and how, thanks to its use of more complex workflows, it has become a primary language for data transformation and interactive analysis. Python development has been confined to local data processing and smaller data sets, requiring data scientists to make many compromises when attempting to work with big data. Using Ibis, a new open source data analysis framework, Python users will finally be able to process data at scale without compromising user experience or performance.

The initial version of Ibis provides an end-to-end Python experience with comprehensive support for the built-in analytic capabilities in Impala for simplified ETL, data wrangling, and analytics. Upcoming versions will allow users to leverage the full range of Python packages as well as express efficient custom logic using Python. By integrating with Impala, the leading MPP database engine for Hadoop, Ibis can achieve the interactive performance and scalability necessary for big data.

“With its usability, extensibility and robust third-party library ecosystem, it’s easy to understand why Python is the open source language of choice for so many data scientists. However, we recognize its limitation – where it’s unable to achieve high performance at Hadoop-scale,” said Wes McKinney. “With Ibis, our vision is to provide a first-class Python experience on large scalable architectures like Hadoop, with full access to the ecosystem of Python tools.”

Ibis is available as a preview in Cloudera Labs (cloudera.com/labs), a virtual incubator for new projects that further enrich the Hadoop community and ecosystem. Ibis is an Apache-licensed project and open to contributions from the open source community (github.com/cloudera/ibis).

For more details, read about the technical vision for Ibis at http://blog.cloudera.com/blog/2015/07/ibis-on-impala-python-at-scale-for-data-science.

Wrangle Conference

In light of Hadoop’s wide ranging flexibility and practicality, and as data scientists can now leverage its power to solve some of today’s most pressing problems, Cloudera has announced Wrangle, a single-day, single-track industry event that will dive into the principles, practice, and application of data science from the startup to the enterprise. Presenters include data scientists from Facebook, Salesforce, Uber, and more, who will share the most challenging problems they’ve faced and what they’ve learned. Wrangle will debut this Fall, on October 22, in San Francisco.

Registration for Wrangle (wrangleconf.com) is currently open by invitation only, with public access available soon.

Big data technologies are critical tools for data scientists. No matter what the use case or how complex the problem is, Cloudera is ensuring data scientists can easily leverage the power of Hadoop, no matter what their preferences are for tools.

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera’s open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.

Source: Cloudera

Datanami