Follow Datanami:
October 15, 2014

EMC, Pivotal Add Compute to Hadoop Data Lake

Storage vendor EMC Corp. and cloud specialist Pivotal have partnered to roll out a new version of a Data Lake Hadoop Bundle that adds a compute option to the big data product along with accelerated analytics that come with scaled out storage and computing along with analytics software.

As data lakes gain momentum as scalable repositories for data generated from current and advanced workloads, EMC and Pivotal are positioning Data Lake Hadoop Bundle 2.0 as a tool for plumbing the depths of data lakes to derive value from big data.

The partners said Oct. 15 that version 2.0 includes EMC’s data computing appliance that is billed as a big data engine designed to simplify deployment and scaling of Hadoop and advanced analytics.

The partners said the upgrade is part of a push by to deliver Hadoop for predictive analytics across enterprises. To that end, the data computing appliance has been optimized in the latest Hadoop Bundle for big data workloads and better analytics performance.

Jeremy Burton, EMC’s president for products and marketing, noted in a statement that release of the latest bundled offering targets the scaling of data lakes and the resulting need to leverage Hadoop for big data applications like predictive analytics.

By bundling computing, storage and analytics, added Pivotal President Scott Yara in the same statement, big data analytics becomes more accessible as the data lakes become a preferred storage repository.

Other features included in version 2.0 include EMC’s scale-out Isilon storage nodes, Pivotal’s enterprise version of Hadoop (Pivotal HD) and Pivotal’s parallel SQL query engine for Hadoop, HAWQ. Pivotal’s Hadoop distribution is pre-configured and “hardened” on EMC’s data computing appliance to provide advanced analytics on Hadoop, the companies said.

Data lakes are being promoted as an emerging data management platform that could help eliminate information silos. The approach includes combining different managed collections of data in an unmanaged data lake.

Critics have attempted to throw cold water on the data lake trend, arguing instead that enterprises will continue to require secure data repositories in the form of data warehouses.

Market analyst Gartner concluded in a recent report that gaps in the data lake model are generating confusion among IT managers about precisely what the storage option can and cannot offer and whether it represents an enterprise-wide big data solution.

The Gartner study concluded that data lakes, unlike traditional data warehouses, “carry substantial risks.” One reason is that promoters of data lake technology assume most if not all potential customers are skilled at data management and analysis.

Still, embattled IT managers continue to look for increased agility and accessibility to data in order to boost performance and speed up data analysis. That is one potential benefit of new tools like the EMC-Pivotal offering.

The partners said Data Lake Hadoop Bundle 2.0 is available immediately.

Recent items:

Are Data Lakes All Wet?

To Hadoop, Or Not to Hadoop?