Elastic Stack Searches for Bigger Data Problems
Elastic is best known as the commercial vendor behind Elasticsearch, the open source search engine that’s widely used around the world. But with this month’s release of Elastic Stack V5, the company is staking a claim with a wider range of big data processing capabilities, including graph analytics and visualization. Next up: unsupervised machine learning.
There’s no shortage of big data stacks on the market. You’ll find them in the Hadoop world, where distributors include dozens of processing engines that sit atop HDFS. You’ll find them in the NoSQL world, where vendors have bolted graph analytic engines onto schema-less databases. Then there’s AMPlab, which developed its Berkeley Data Analytics Stack (BDAS) with Spark and Mesos.
With so many stacks, one wonders if there’s space for one more, let alone one built around something as humble as a search engine. But according to Elasticsearch creator Shay Banon, who’s also the co-founder and CTO of the commercial open source software company Elastic, expanding beyond search has always been part of the plan.
“When I wrote the first few lines of Elasticsearch, my vision for search was more than something that powers a search box on a website,” Banon says. “I was frustrated because whenever somebody was talking about search, they were thinking about enterprise search and things like how can index Word docs from Sharepoint, how can I build a search box.”
Elasticsearch, which traces its roots back to 2004 when Banon modified Doug Cutting’s original Lucene search engine, ostensibly is a full-text search engine built in Java that stores data as JSON documents in a NoSQL-like data store and features a RESTful API for simplified integration.
But in practice, it’s become much more than that–especially with the addition of log aggregation (Logstash) and visualization (Kibana) add-ons that Elastic grouped together into what the community called the ELK stack. With the launch of Elastic Stack V5 on November 17, Elastic has simultaneously embraced the core “stack-iness” of this enterprise while also moving away from the ELK name.
Goodbye ELK, Hello Elastic Stack
The main difference between the new Elastic Stack and the ELK stack is the addition of Beats, an independent product designed to capture operational data. There was no graceful way to incorporate a “B” into ELK, so Elastic Stack was born. The company is also shipping X-Pack, a commercial extension that provides security, monitoring, alerting, reporting, and entity graph capabilities, which the company first announced in March. Each of the individual products has also been updated with more capabilities; you can read more about them here.
Banon says V5 is the first time the company has aligned all the products together as a single entity. “Historically, because the products slowly evolved and joined our company, they had different versions, and it was hard to understand what works with what, and the integration points were not as smooth as they could be,” he tells Datanami. “In V5 we effectively aligned all the versions together, so we’re treating it as a single component or entity.”
Moving forward, the company will be looking to provide machine learning capabilities as a feature of the stack. Elastic, which has received more than $100 million in funding since it was founded in 2012, acquired Prelert earlier this year to provide unsupervised machine learning and behavioral analytics to data stored in Elasticsearch.
The Prelert acquisition will enable Elastic Stack customers to benefit from operational analytics and anomaly detection being features that are pre-enabled out of the box, Banon says.
“Imagine taking a stream of data, things like orders per minute or the number of API calls or the number of visitors to your website, and be able to, without training, to detect seasonality around it and the trend of how it behaves, and basically model it, and through that to be able to identify things like anomalies and alert you when there is an anomaly,” Banon says.
“Even though machine learning is one of the more hyped aspects [of big data] if you look at the last year, they [Prelert] have had quite a few users successfully using the system and it proves that it actually has value,” he continues. “We’ve acquired the company and now we’re furiously working toward making it into something. We consider it to be a feature in the stack, not a whole new problem.”
While the Elastic Stack is getting more advanced capabilities, customers will not suffer from the added complexity that more code brings, such as what is taking place in the Hadoop ecosystem, Banon says.
“First of all, we’re in control of all the products,” he says. “It helps us when we go to create a single cohesive vision for all the products, compared to the Hadoop ecosystem, which has multiple vendors, all of them pulling in different directions and thinking that one product within the stack is more important than the other.”
“I would also say,” Banon continues, “that we’ve been very careful around adding new products. They have to justify why they need to exist. So far, we only have four. I think Hadoop has 100. So trying to integrate and make four products work well together, whether it’s from the security perspective or usability….that’s something that I think we’ve managed to do well.”
Banon doesn’t expect Elastic Stack to replace many Hadoop clusters. The Hadoop platform and ecosystem is much wider and more diverse than what Elastic has done with its stack, and is better situated to be the core platform where data scientists can automate the ingestion, storage, preparation, and analysis of vast amounts of data.
In most instances, the Elastic Stack and Hadoop platforms will co-exist–and of course, you can run a supported instance of Elasticsearch within a Hadoop cluster, just as you can run Elasticsearch’s main competitor, Apache Solr, to search against data stored in HDFS.
Instead Banon sees Elastic Stack eating into Splunk‘s proprietary model. You can also see a parallel between Elastic Stack and what the NoSQL and NewSQL database vendors are doing by delivering operational analytic capabilities that ride atop core transaction processing systems. Elastic Stack has also gone “upstack” to deliver some compelling analytic capabilities for data that’s living in the Elastic repository anyway.
Data Stack Smackdown
But if Banon has his way, you’ll find yourself putting more data into the Elastic document repository, just to take advantage the capability to process unstructured data in the fast and intuitive manner that the search engine can deliver.
The forthcoming unsupervised machine learning enhancements from the Prelert acquisition, not to mention the existing Spark Streaming and graph capabilities that are already available, speak to that. It’s all about enabling flexibility and giving users more of what they want, Banon says.
“If you think about a search engine, indexing large volumes of content, specifically unstructured data, you don’t necessarily know exactly what you’re going go query. You just want to know that you can query everything,” he says. “You don’t have to predefine how you’re going to slice and dice the data with a search engine. You can zoom in and zoom out and potentially aggregate the data in a free-form manner across any attributes.”
With more than 2,500 paying customers, an active community of 72,000 developers, and 75 million downloads since 2012, it would seem that Banon has certainly hit upon a successful recipe for bringing the power of search to the masses.
“Even before Elasticsearch, I was using search in banks, for high-frequency trading [applications] because it just ended up being faster than anything you can imagine,” he says. “One of goals that I had was to build a system like Elasticsearch to show users what they can do with it. And obviously, it had the level of success that it has because it actually solves a useful problem.”