Hortonworks Shifts Focus to Streaming Analytics
Hortonworks started life providing a Hadoop distribution that allowed customers to process big data at rest. But these days, the company has shifted its much of its attention and resources to streaming analytics, or processing big data in motion.
The signs of the paradigm shift are evident here at the DataWorks Summit. In fact, it starts with the name. Hortonworks‘ annual June shindig at the San Jose Convention Center used to be called “Hadoop Summit.” But clearly Hadoop is no longer the center of gravity in the big data space that it once was.
Instead of announcing a new version of its Hadoop distribution, as it has in the past, Hortonworks used its biggest show to highlight a major new version of its streaming data platform, called Hortonworks Data Flow (HDF).
“Hortonworks is moving from ‘We do Hadoop” to ‘We do connected data architectures,'” Hortonworks VP of product management Jaime Engesser tells Datanami. “If you look at the streaming analytics space, that’s where we’ve now doubled down.”
Nearly half of its biggest customers the Hortonworks Data Platform (HDP) have already adopted HDF, Engesser says, while perhaps 30% of the overall Hadoop customer base have picked up the streaming product. “Our adoption rate for HDF is a really big deal,” he says. “The growth is really focused around streaming.”
The company expects streaming analytics adoption to increase even more with this week’s release of HDF 3.0. The big news with this release is the addition of a Streaming Analytics Manager (SAM), which allows users to develop streaming analytic applications in a graphical, drag-and-drop manner.
HDF 3.0 also adds Schema Registry, a shared repository of schemas that brings a centralized data governance capability across multiple streaming engines, including Apache Kafka, Apache Storm, and Apache NiFi.
The addition of SAM and Schema Registry bolster HDF’s capabilities, Engesser says. “When we stated last January, it [HDF] only had NiFi. That’s all HDF was,” he says. “Now it has NiFi, Schema Registry, Kafka, Storm, Ambari, Ranger, and Atlas. It’s got a whole suite of things that are required” to run streaming analytics.
Apache Storm is the default stream processing engine for HDF, but the plan calls for letting users swap it out for other engines. “We just defaulted to Storm because we know it really well and we know it can scale,” Engesser says. The product also supports Spark Streaming today, and Flink and Apex will likely be supported at some point in the future.
The key thing that HDF adds to the streaming analytics equation is ease of development. “If you look at world through enterprise customers’ eyes, they have Storm, they have Spark Streaming, they have Flink, they have Apex,” Engesser says. “They’ve got a million different streaming engines. And all of them require you to have to a rock star Java dev to get anything done, or a rock star Scala dev in the case of Spark, etc.”
Instead of requiring customers to hire Java or Scala developers, they company decided to automate the development of streaming analytic applications, or what Engesser calls become “the Tableau for streaming.”
“If you look at what Tableau does really well, it says ‘Give me a data set, and I can get you to a quick insights,'” he says. “We did the same thing with streaming.”
The company is now working on making HDF integrate more closely with cloud platforms and data repositories, Engesser says. The idea is to make it dirt simple for cloud AWS or Azure customers to get up and running with HDF and integrated into their cloud data stores.
The company isn’t giving up on Hadoop. But it’s clear that Hortonworks future will hinge on its capability to help customers manage, secure, govern, and ultimately monetize data no matter where it sits.
“You still need data at rest,” Engesser says. But “data in motion is new area that we need to focus on.”