EraDB Says Elasticsearch Clone Is More Scalable, Easier to Run
EraDB today took the covers off EraSearch, a distributed log management tool built atop the startup’s S3-based database service. The company claims the Kubernetes-based EraSearch offering is API-compatible with Elasticsearch, but is more scalable and easier to manage than the popular open source log management tool it seeks to replace.
EraDB CEO Todd Persen and his co-founder, CTO Robert Winslow, didn’t target Elasticsearch with its new product because it thinks Elasticsearch is a bad product. In fact, Persen recognizes that Elasticsearch is a very useful tool that has enabled lots of people to get value out of their log data.
But according to Persen, limitations in Elasticsearch’s capability to scale to handle today’s massive data volumes are not going to be addressed any time soon with the Java-based product, which is now 12 years old. That has put the writing on the wall for some Elasticsearch customers that it’s time to move on, he says.
“It’s very expensive from an infrastructure perspective, and it’s also very complicated,” says Persen, who previously was the CTO and co-founder of time-series database provider InfluxData before starting EraDB, which also is a time-series database. “You need ops teams who are constantly taking care of the systems, managing node health, managing disks, and how data is distributed around the cluster.”
When you factor in Elastic’s recent decision to change its license, you find more people starting to look for alternatives, Persen says.
“There’s a lot of folks who chose it because it’s open source, but now no longer feel like they’re supporting a true open source product,” he says. “They say, I’ve banked all my energy on this thing, but it seems like it’s changing out from under me. Should I go with something else. Is this time to switch?”
EraDB is hoping to position its EraSearch offering as something they can switch to without a lot of pain. Because the Elasticsearch query API was so well documented, Persen and his team were able to write their own software that adheres very closely to that API. Person says customers can basically drop EraSearch in, and users won’t even know the difference.
EraSearch replicates the ElasticSearch query API for all log queries, right down to having the same input and output and the same query syntax. It supports all of the same tools that Elasticsearch uses to input log data, including Logstash, Telegraf, Vector, and Kafka, and supports some of the same tools for visualizing output, such as Kibana, Grafana, and Drill.
EraDB has not tried to replicate all of the other things that Elasticsearch does, such as for document search, machine learning, etc. It is focused only on log management and related observability use cases, including security. But for that log management component, it is convinced that it has a better cog for user’s big data machines.
“We’ve essentially replicated the lifecycle of a query through Elasticsearch inside of Era Search,” Persen says. “We wrote our own query parser to do the exact same things that Elasticsearch does. We’ve just written it in a new, more performant language using some of the architectural design principals that are part of EraDB.”
EraSearch was written in Rust, which provides extensive performance benefits over Java and its memory-sapping garbage collection routines. It runs as a container inside Kubernetes, which enable it to scale more easily than previous generation products, like Elasticsearch and Apache Cassandra, Persen says. EraSearch leverages the data caching and indexing technologies in EraDB, the company’s database layer that runs atop an S3-compatbile object store, which provides the nearly limitless storage.
In short, EraSearch was built atop a modern, cloud-native architecture, which Persen does not expect Elastic to do anytime soon with the hugely popular Elasticsearch product. When you put all these elements together, an EraSearch cluster running on the same infrastructure as an Elasticsearch cluster will enjoy 2-3x performance advantage, Persen says. Users get that benefit, without giving up anything in the usability department, he says.
“You have the same ergonomics. You can use the same tools to write data in and the same tools to write data out,” Persen says. “Essentially we can give you query parity and all you have to do is deploy EraSearch alongside Elasticsearch installation. You can evaluate them side by side using the exact same tools. Once you’ve deployed EraSearch and you’re ingesting data into it, everything works just like it would with Elasticsearch, so the switching cost is very low.”
Because EraSearch is focused solely on log management, it is not going to replace the non-log management things that Elasticsearch does, such as relevancy ranking. Persen is a fan of relevancy ranking when it comes to Internet search engines, but not for log management.
“We’ve been able to focus our energy, focus our problem solving and remove some of those things that, while they might be useful in other domains, are not an additive features,” he says. “They’re acutely a hindrance because we have to do all this extra work and do all this extra processing that slows things down.”
The company favors strong consistency, which means all log data is written to S3 before an acknowledgement is sent to the user. There will be no lost data in the EraSearch pipeline, which is not the case with systems that use eventual consistency, including InfluxDB and Elasticsearch, Persen says.
“A lot of it is making sure we’ve designed a system that can keep up with that write load and also that’s saleable, whether it’s one node doing ingest or 10 nodes, that those can actually scale independently of each other without creating a conflict,” he says. “In a lot of systems, you’ll see, as you start to scale, there’s this bottleneck that starts to come….So a lot of it is making some of these really important architectural decisions up front so that we have a system that’s capable of maintaining that correctness’s from an indexing and caching perspective, but also still being able to manage that durability so we can have those guarantees for end users.”
The company is still getting its feet underneath it, but it is eyeing roadmap additions, such as support for machine learning, which will resonate with users. “AIops is a category that will be interesting to us,” Persen says. There aren’t ML algorithms searching for interesting correlations in the log data yet. “But it’s something we’re actively doing R&D on. It’s definitely in the roadmap for this year.”
EraSearch runs on the cloud and any S3-compatible object store. To find out more about EraSearch, check out the company’s website at www.eradb.com.