October 10, 2017

Mature MADlib Moves Up Apache Development Chain

George Leopold

As the in-database analytics library MADlib makes inroads in emerging cloud-based AI applications, the open source initiative is moving up the development chain as a priority project within the Apache Software Foundation.

Pivotal, which last month released the latest version of its MADlib-based Greeplum analytics platform, said the promotion to a “top level” Apache project would help accelerate development of in-database machine learning as well as advanced analytics for its Greenplum database.

Researchers with several universities along with data scientists at the former EMC Greenplum operation helped develop the initial MADlib code base. (Pivotal was spun out from EMC prior to Dell Technologies’ acquisition of the storage giant.)

It became an Apache incubator project in September 2015, moving up to a “top level” project in July. MADlib 1.12 was released in August, incorporating new graph analytics, new sampling algorithms and an artificial neural network.

Pivotal then followed with the released of its Greenplum 5 database, which also seeks to enable analytics in multi-cloud deployments. “We have seen our customers successfully deploy MADlib on large scale data science projects across a wide variety of industry verticals,” Elisabeth Hendrickson, Pivotal’s vice president for data R&D, noted in an Oct. 6 blog post.

“We anticipate increased adoption in the enterprise given the mature level of the code base and the active developer community,” Hendrickson added.

“The ability to perform in-depth and detailed analytics, on both structured and unstructured data, using SQL enables MADlib to be applicable in scenarios where others simply can’t compete,” added Jim Jagielski, vice chairman of the Apache Foundation.

Last month’s release of Greenplum provided another boost for the open source project. The latest version is intended as a single platform for executing and scaling computing intensive analytical workloads. Pivotal also stressed that Greenplum 5 aims to eliminate data silos by integrating traditional and advanced analytics on a single platform that can scale across hybrid cloud infrastructure.

The database platform is certified and available to run on public clouds including Amazon Web Services (NASDAQ: AMZN), Microsoft Azure (NASDAQ: MSFT), Google Cloud Platform (NASDAQ: GOOGL) along with private clouds based on VMware vSphere (NYSE: VMW) and OpenStack.

Along with the Greenplum analytic data warehouse, SQL-based MADlib also supports PostgreSQL databases.

As a primary backer of the MADlib project, Pivotal pulled the plug on its proprietary big data strategy just over two years ago and announced a major repositioning of its core products that include Greenplum. The San Francisco-based analytics vendor also announced new partnerships last month, including deals with Chinese cloud infrastructure providers, Alibaba (NYSE: BABA) and Tencent (HKG: 0700). They will integrate Greenplum database technology into their cloud infrastructure.

In announcing the deal, Tencent noted that MADLib “has a large user base in China.”

Recent items:

Pivotal Takes Greenplum to the Cloud

An Open Source Tour de Force at Apache: Big Data 2016

 

Share This