Follow Datanami:
September 26, 2016

Pentaho Announces Five New Data Integration Enhancements

Sept. 26 — Pentaho, a Hitachi Group Company, today announced five new improvements, including SQL on Spark, to help enterprises overcome big data complexity, skills shortages and integration challenges in complex, enterprise environments. These big data integration enhancements help IT teams deliver value from big data projects faster with existing resources, by eliminating the need for manual coding, providing tighter security and supporting more of the big data technology ecosystem

1. More Apache Spark Integration

Pentaho expands its existing Spark integration in the Pentaho platform, for customers that want to incorporate this popular technology to:

  • Lower the skills barrier for Spark – data analysts can now query and process Spark data via Pentaho Data Integration (PDI) using SQL on Spark
  • Coordinate, schedule, reuse, and manage Spark applications in data pipelines more easily and flexibly – expanded PDI orchestration for Spark Streaming, Spark SQL and Spark machine learning (Spark MLlib and Spark ML) to support the growing number of developers who use multiple Spark libraries
  • Integrate Spark apps into larger data-driven processes and get more out of them – PDI Orchestration of Spark applications written in Python benefits developers writing Spark applications in this popular language

2. Expanded Metadata Injection Capabilities

Pentaho’s unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformations at runtime instead of having to hand-code each data source, reducing costs by 10X. Pentaho adds over 30 compatible PDI transformation steps, including operations related to Hadoop, Hbase, JSON, XML, Vertica, Greenplum, and other big data sources.

3. Expanded Hadoop Data Security Integrations

Securing big data environments can be extremely difficult because the technologies that define authentication and access are continuously evolving. Pentaho expands its Hadoop data security integration to promote better big data governance, protecting clusters from intruders. These include enhanced Kerberos integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets

4. Apache Kafka Support

Apache Kafka’s increasingly popular publish/subscribe messaging system handles large data volumes common in today’s big data and IoT solutions. Pentaho now provides Enterprise customer support to send and receive data from Kafka, to facilitate continuous data processing use cases in PDI.

5. Enhanced Support for Popular Hadoop File Formats

Pentaho now supports the output of files in Avro and Parquet formats in PDI, both popular for storing data in Hadoop in big data onboarding use cases.

“Our latest enhancements reflect Pentaho’s continued mission to quickly make big data projects operational and deliver value by strengthening and supporting analytic data pipelines,” says Donna Prlich, Senior Vice President, Product Management, Product Marketing & Solutions, at Pentaho. “Enterprises can focus on their big data deployments, removing the complexity and time involved in data preparation by taking advantage of new, high potential technologies like Spark and Kafka in the big data ecosystem.”

About Pentaho

Pentaho, a Hitachi Group company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho’s unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho’s mission is to help organizations across multiple industries harness the value from all their data, including big data and IoT, enabling them to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. Pentaho has over 15,000 product deployments and 1,500 commercial customers today including ABN-AMRO Clearing, BT, Caterpillar Marine Asset Intelligence, EMC, Landmark Halliburton, Moody’s, NASDAQ and Sears Holding Corporation. For more information visit www.pentaho.com.


Source: Pentaho

Datanami