Hortonworks Deals to Streamline Hadoop Deployments
Hortonworks today announced the acquisition of Sequence IQ, a Hungarian developer of cloud deployment automation tools for Hadoop. Hortonworks, which is hosting its Hadoop Summit Europe in Brussels this week, also shipped a maintenance release of its Hadoop distribution that includes Apache Ambari 2.0 and officially adds Apache Spark to the mix.
One of the big challenges customers face when implementing a modern Hadoop cluster is just getting it up and running. It’s not such an issue in small clusters that have less than 10 nodes. But as the cluster extends beyond 100 or 1,000 nodes, it quickly becomes too expensive and tedious to do the work manually.
A number of companies and open source projects are chasing this problem, including Sequence IQ. From its headquarters in Budapest, the company has developed a pair of products that fit into this space.
The first is Cloudbreak. This product makes it much easier for customers to provision and deploy Hadoop clusters, whether they live in the cloud (Amazon AWS, Microsoft Azure, and Google cloud are supported), in Docker containers, or running on bare metal. The software uses the “blueprints” functionality in Ambari, the Hadoop operations and management console, to enable users to easily duplicate customers’ Hadoop setups; support for OpenStack is on the horizon.
The second Sequence IQ product that caught Hortonworks‘ eye is Periscope, which provides auto-scaling capabilities for Hadoop clusters. The software, which also is integrated with Ambari, analyzes various performance metrics for the cluster, and automatically adds nodes as needed, based on policies set by the user.
Cloudbreak and Periscope fit very nicely together and complement the power of Ambari to streamline the management of Hadoop clusters, says Hortonworks vice president of product management Tim Hall. “You can imagine folks spinning up a 1,000 node cluster. That’s a lot of work to do. Just doing one step on every machine is too many,” he says. “So whatever we can do to help streamline automation, that’s been the focus around Ambari.”
Ambari has matured over the past year and gained more powerful capabilities, including Blueprints extensibility mechanisms and the new Alerts framework in Ambari 2.0, which Periscope can use to trigger the addition of Hadoop nodes.
“We’re seeing evidence that the community is understanding how to take advantage of the Blueprint extensibility mechanisms and really leveraging it to the maximum capability, which is awesome,” Hall says. “The Sequence IQ team has been wonderful to collaborate and work with and we’re excited to bring them into the family.”
Hortonworks plans to contribute the Sequence IQ products back to the Hadoop community, either by donating the intellectual property (IP) to an existing Apache Software Foundation project–potentially Apache Ambari itself–or by incubating a new one, Hall says. In any event, the Sequence IQ capabilities will be added to a future release of Hortonworks Hadoop distribution, but only for customers who have purchased Enterprise Plus support subscriptions for HDP, he says.
Hortonworks is happy to add the Sequence IQ team to its existing staff, and is eager to use its existing business to help it create a “beachhead” in Europe, Hall says. Looking forward, Hortonworks will be looking at ways to leverage some of the work Sequence IQ is doing with integrating Hadoop into OpenStack.
“Originally the integration between OpenStack and Hadoop was through the Sahara plugin. That worked great when Hadoop was MapReduce and HDFS as one project,” Hall says. “What we’ve seen now over time is, as they’re as more and more componentry in the Hadoop ecosystem that’s being deployed as a platform, it was causing a ripple effect in terms of what needed to be exposed and managed through that Sahara plugin.”
Instead of creating dozens of individual integration points between Open Stack and Hadoop for all the various Hadoop processing engines a customer might use–Hive, HBase, Cassandra, Spark, MapReduce, Tez, etc.–the Sequence IQ team is taking a different approach, and using Ambari’s blueprint API as the starting point for defining how Hadoop will deploy within OpenStack.
“From our perspective, the approach that the Sequence IQ team is taking makes more sense, given the way the Hadoop ecosystem is headed,” Hall says. “Part of it has to do with what is the right binding point into the OpenStack infrastructure for deployment of Hadoop…If it was just one Hadoop project being bound to Sahara and integrating into Open Stack, that makes some sense. But when you’re talking about 20 other components now that makes up the platform, exposing the details of all 20 of those things into the Sahara plugin didn’t make sense architecturally. The churn that comes along with what’s happening through inclusion or removal of various points, just meant that there was going to be a lot of investment in the Sahara plugin to keep up.”
Hortonworks also announced the first maintenance release for HDP 2.2, which shipped last fall. The new release includes Ambari 2.0, which Hortonworks unveiled last week, in addition to various other enhancements, including support for Apache Spark version 1.2.1. It’s the first time Hortonworks has official supported the popular in-memory computing framework in its Hadoop distribution.
The company also proposed a new Apache project for the Data Governance Initiative that it started earlier this year. Apache Atlas, as the DGI project would be known, aims to help rein in some of the data chaos that occurs on Hadoop. Specifically, Atlas will provide data classification, centralized auditing, search and lineage capabilities for Hadoop, as well as security and policy engines.
The new HDP release also includes Apache Ranger, a security management tool for Hadoop that came out of Hortonworks previous acquisition of XA Secure. Hortonworks has also streamlined the deployment of the Kerberos authentication subsystem in HDP; it can now be up and running in just a few clicks, the company says.