IBM Rolls Power-based ‘Data Engine’
A new IBM “data engine” for running key open source analytics frameworks is built on the company’s recently added line of OpenPower Linux servers customized for big data workloads.
Technical details of the IBM data engine for Hadoop and Spark were released this week (Feb. 9). The company said the analytics platform would be available on March 18.
IBM (NYSE: IBM) said its date engine targets retailers increasingly combining customer-buying patterns with real-time social media feedback along with manufacturers analyzing event data. It also is geared to IT and security teams trying to analyze log data in search of breaches.
A range of data engine configurations will be offered based on IBM’s line of Power8 servers with up to 1 Tb of memory. Configurations range from a “starter” delivering more than 50 Tb of usable data to multi-rack setups providing up to 1.3 petabytes of raw data per rack. Each comes with a standard triple replica Apache Hadoop or Spark configuration, IBM said.
The data engine also integrates cluster management and analytics software (the latter must be ordered separately) tuned to Apache and Spark workloads. The data engine is “based on a set of standard building blocks that can be tailored to fit the data size, throughput, and scale required for the target analytics scenarios,” IBM said. “Spark workloads benefit from large, fast memory and lots of processor threads,” the company also noted in its hardware announcement. “Hadoop workloads require large storage capacity, high-speed networks, and a resilient cluster file system.”
Extending its foray into the open-source world, IBM said its Hadoop platform includes components “aligned with the Open Data Platform consortium” which it joined last year as a founding member. Along with Hadoop components, the open platform release includes the Apache Ambari deployment and management tool.
The company announced a series of initiatives last February aimed at improving “in-Hadoop analytics” and promoting big data standards. It also released an open platform version of Apache Hadoop distribution that runs on its Power and Intel x86 platforms.
Among the Power8-based server configurations being offered are a “standard” data node for Hadoop workloads that includes a 2.92-GHz server with 128 Gb of memory. For memory-intensive Spark workloads, DRAM is doubled while storage is reduced.
Addressing emerging network bottlenecks as data volumes grow and more data are moved around, IBM also said it would offer several 10-Gb Ethernet switch configurations to handle “fast movement of data at scale” required for Hadoop and Spark clusters.
The big data engine based on its Power servers also reflects IBM’s growing investment in Apache Spark as a leading in-memory analytics platform. Last week, it unveiled a host of new cloud-based data services designed to bolster its hosted Apache Spark business with NoSQL, graph, and machine learning capabilities.