October 6, 2015

Pivotal Opens Up HAWQ, MADlib

George Leopold

Pivotal Software has turned over its SQL-on-Hadoop engine along with its MADlib machine-learning tool to the open source community as it seeks to extend the reach of its interactive SQL engine deeper into the Hadoop ecosystem.

Pivotal, which announced in April it was teaming with Hortonworks by combining its big data suite with its partner’s Hadoop platform, said recently its contributing HAWQ engine and MADlib framework to the Apache Software Foundation, giving each “incubation” status within the open source group.

As hyperscalers like Netflix build data infrastructure that is tied directly to applications, Pivotal said its contribution of HAWQ and MADlib provides a proven SQL engine that would fill in missing building blocks in the Hadoop ecosystem.

HAWQ incorporates the SQL processor and relational query engine of the Pivotal’s original Greenplum database. “Greenplum on Hadoop has evolved significantly to a system recast in terms of Hadoop,” San Francisco-based Pivotal noted in a blog post announcing the open source contributions.

MADlib emerged from collaboration between researchers at the University of California at Berkeley, University of Wisconsin, University of Florida and engineers and computer scientists at Pivotal. Designed for in-database analytics, MADlib leverages the massively parallel-processing capabilities of the Greenplum database and HAWQ.

The open source contribution represents the “first big step toward building not only a Hadoop Native SQL engine, but ultimately an entire Hadoop Native, data center-class, high performance analytic database infrastructure,” Pivotal asserted.

It also cited the transformation of the database industry driven in part by the rapid rise of mobile and Internet of Things workloads along with the meshing of data with continuous delivery of applications. Those factors have combined to make Hadoop ” the fundamental substrate of new generation data warehousing,” the company said.

The Hadoop partnership with Hortonworks announced earlier this year is designed in part to move HAWQ away from a proprietary management and configuration framework to an open source, Hadoop-native environment. Pivotal claimed that would reduce the total cost of ownership in managing the Hadoop stack, including Pivotal HAWQ.

Meanwhile, MADlib is positioned as an open source library for scalable in-database analytics and is designed to provide “data-parallel implementations” of mathematical, statistical and machine learning methods for structured and unstructured data. The framework uses shared-nothing, distributed, scale-out architectures to offer a toolset for analytics problems involving very large data sets. MADlib is SQL-based and supports PostgreSQL as well as Apache HAWQ and Pivotal Greenplum databases.

The library’s SQL APIs are designed to allow it to work with on a wide range of data stores and SQL engines along with a common language on which to build. Pivotal said the tool kit includes algorithms for classification, regression, clustering, topic modeling, association rule mining, descriptive statistics and validation.

Recent items:

Pivotal, Hortonworks Join Forces on Hadoop

Pivotal Refreshes Hadoop Offering, Adds In-Memory Processing

Applications: Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Retail

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Pivotal Opens Up HAWQ, MADlib

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Pivotal Opens Up HAWQ, MADlib

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link