Taming Apache Storm for Real-Time Analytics
Apache Storm is gaining a foothold among organizations looking to do real-time analytics on streaming data. However, the difficulty in working with the distributed processing framework is proving to be a major hurdle to Storm adoption. Now, a company called Impetus says it’s simplifying development on Storm with a new product.
Released as open source by Twitter several years ago, Apache Storm plays a key role as the real-time processing layer of the emerging big data technology stack. While the distributed framework is powerful, it’s also challenging for developers to write to, and tough for IT folks to configure and run.
Simplifying Storm is one of the challenges that Impetus hopes to solve with StreamAnalytix, which the company launched at the recent Strata + Hadoop World conference in San Jose. According to Imeptus CTO Vineet Tyagi, the shrink-wrapped product masks much of the complexity of working within Storm, and allows users to build real-time applications within a visual drag-and-drop interface.
“The problem that Storm and Kafka and all the real-time use cases have…is the skills are not available,” says Tyagi, who also heads up the company’s innovation lab. And when companies do have people with the necessary skills, he says, the time it takes to develop and deploy working applications are often too long.
StreamAnalytix addresses these challenges by providing developers with a pattern-driven platform for building real-time analytic applications atop Apache Storm. “There’s a visual application designer that basically lets you drag, drop and create and express your real time application,” Tyagi tells Datanami. “That’s pretty much the only thing you have to do.”
Instead of writing a real-time streaming analytics application from scratch atop Storm, Impetus has done much of the grunt work for you. User can choose from more than 130 different message types that are already supported by the platform, such as HDP log files or TCP/IP messages. Users can stitch data coming in via Kafka with other pieces of data, process them in certain ways, and then output them to storage, all as a configurable workflow.
Users “express” their application components within the StreamAnalytix application, and “stitch” them together, rather than write it from scratch using Sprouts and Bolts and the Storm topology, Tyagi says. The software brings built-in support for running machine learning algorithms on fast-moving data, as well as the capability to create workflows and generate alerts to downstream applications. Users have a choice of using HDFS, HBase, or Cassandra as the persistence layers.
The software takes the worry about of configuring and deploying a real-time application based on Storm, Tyagi says. “You can be up and running in under 10 minutes,” he says. “So it really makes real-time streaming application development to the next level.”
Based in Los Gatos and India, Impetus is a systems integrator with clients in a number of industries. About 10 years ago, the company founded an innovation lab in its India office to try to get ahead of emerging technologies like Hadoop. The company has built some Hadoop applications for its clients—including SQL and ETL offloading from enterprise data warehouses.
As part of the labs initiative, Impetus identified real-time streaming applications using very low-latency processing as one of the places where it could differentiate itself. The potential for using technologies like Storm are very compelling, but some of the rough edges are slowing users down.
Larry Pearson, the company’s vice president of marketing, relayed an exchange he had on the Strata expo floor. “I had the chief architect for one of the largest credit card companies in the world come into the booth,” Pearson says. “He said ‘I’m one of the first 200 developers to ever develop an application on Apache Storm.’ I asked ‘How’d that go for you?’ And he immediately he said, ‘I don’t ever want to do it again.'”
StreamAnalytix was in beta tests at various clients from last August through the start of year. The early adopters tested the software in log analytics, streaming ETL, and real-time natural language processing. Now that it’s generally available, Pearson expects to see additional use cases emerge, including becoming a big and fast enterprise service bus.
“If you talk about the data-driven enterprise, it has to have a nervous system,” Pearson says. “StreamAnalytix potentially could be managing pipelines of data from any source to any target repository, and doing all kinds of cool stuff with it in between. People are really intrigued with that. It’s opening up a whole vista of possible solutions that frankly, when we were designing it, we didn’t’ think about.”
StreamAnalytix supports Apache Storm today. In the future, it will support Spark Streaming as well.