Hortonworks Unveils New Offerings for AWS Marketplace
Hortonworks today took the wraps off new big data services that run on the Amazon Web Services (AWS) Marketplace. The Hadoop, Spark, and Hive services are pre-configured, and are designed to get users up and running quickly and easily.
Hortonworks (NASDAQ: HDP) Data Cloud for AWS, as the offering is known, uses a subset of the company’s Hadoop distribution, called the Hortonworks Data Platform (HDP), that’s connected to an Amazon (NASDAQ: AMZN) S3 data store The services can be spun up through the AWS Marketplace with a push of a button, provided you already have an account. That eliminates the need to manually configure a Hadoop environment, and it also simplifies billing.
The two vendors are unveiling three shared services. The first is a data science exploration and visualization environment based on Apache Spark that features the Zeppelin data science workbook. The second is a data preparation and ETL environment that utilizes Hadoop and Hive on Tez. The third is a data analytic and reporting configuring that uses Hive’s new Live Long and Prosper (LLAP) mode, which speeds SQL-based analytics.
The offerings were designed to remove the installation and configuration barriers that may prevent users from taking advantage of some of the useful capabilities in modern Hadoop environment, says Shaun Connolly, chief strategy officer for Hortonworks.
“It’s really tuned what I would say are the fairly ‘rinse and repeat’ use cases that we’re seeing in the cloud,” he tells Datanami. “Ease of use, productive user experience, automated and optimized integration of the HDP stack, as well as the integration with Amazon, are hallmarks of the offering.”
Customers who sign up for Hortonworks Data Cloud for AWS services will not see all the screens in the HDP product, and will not have access to all of its capabilities. You can forget about configuring a long-running MapReduce batch job, or spinning up a Storm cluster to analyzing incoming data. And you won’t have access to the many configuration options that on-premise users can get.
That’s by design. “Those folks who want to turn every knob or dial–they’re already doing it by deploying HDP on AWS as a infrastructure as a service (IaaS)” in the Elastic Compute Cloud (EC2) environment, Connolly says.
Ease of use and quick deployments are major design points. To that end, the “stack adviser” feature of Ambari will ensure that the underlying deployment on AWS iron is optimal from an EC2 spending perspective. “The end user doesn’t have to be bogged down with those details,” Connolly says. “You provide a handful of parameters, you chose your instance type so can optimize the spend, and you’re off and going.”
Customers will have hourly and annual billing options, which is handled through AWS Marketplace. That allows data scientists and analysts to quickly get in, do the work, and then spin down the environment to stop the meter from spinning.
While Hortonworks Data Cloud for AWS may function as an onramp to help familiarize new users with the HDP product, it’s not just for data science experiments, Connolly says. “We expect production workloads that are put on a schedule to run regularly,” he says. “It has a full command line interface, so we expect people to schedule their pipeline and run them through this….It’s not just dev-test.”
It’s the second major cloud-based offering for Hortonworks. The company already provides the Hadoop distribution for partner Microsoft (NASDAQ: MSFT) and the HDInsight Hadoop service that’s available on its Azure cloud.
The HDInsight offering on Azure is a bit more full-featured at this point. It features long-running Storm and HBase clusters, for example, which are not (yet) targeted for the AWS Marketplace service. But Connolly sees the AWS service expanding in the future, perhaps with connections to AWS’ relational database service.
Connolly also sees Hortonworks DataFlow, the platform for managing the movement of big data, to play a role. “It winds up being the data orchestrator of getting data from where it’s born to where it needs to be,” he says. “Its ability to get data into the cloud services, into and out of S3 and into Azure storage.”
In the long run, Hortonworks’ strategy is to enable customers to move and process data in on-premise and cloud environments, and manage permissions, security, and governance all from a single location. The company isn’t quite there, but that’s the plan.
“Naturally this expands to the notion of a data plane. Effectively that’s where we’re headed,” he says. “That’s not part of [today’s] announcement, but that notion of shared security and governance is absolutely what we’re going to be focused on . That’s what enterprise customers want. They want to have visibility across on-prem and cloud deployments.”