Basho Goes Vertical with Big Data Stack
Basho Technologies made a name for itself in the NoSQL database world by developing a scalable key-value store called Riak that’s used by the likes of Time Warner, The Weather Company, and Comcast. Today the company disclosed plans to move up the stack by integrating other big data products–including Apache Spark, Apache Solr, and Redis–into its new Basho Data Platform.
Nobody solves a big data problem with a single set of tools or skills. Big data is bigger than just Hadoop or just NoSQL or just being good at R, SQL, or Java. If this were 1992, we’d be saying that big data “takes a village.” Yet many vendors continue to sell big data products as isolated, stand-alone entities, and leave it up to their customers to integrate them together into a usable form. No wonder technical services is booming within the big data space.
The folks at Basho recognized the problems customers were having with integrating the various components of their big data architectures, and stepped up to provide a solution. It’s all about solving distributed systems problems, according to Basho’s vice president of product and marketing, Peter Coppolo.
“We’re distributed systems folks at heart and the problems our customers are telling us they have are things that we can solve with distributed system technology,” Coppolo tells Datanami. “A lot of problems that people are have at doing this, it’s stuff at which we have core expertise. It’s about high availability, it’s around the replication and synchronization of data. Those are computer science problems that we’ve solved.”
The company’s new Data Platform starts with two underlying technologies, including Riak KV (formerly Riak the key value store) and Riak S2 (formerly Riak CS, its object storage layer). On top of those core databases, the company provides core services around data replication and synchronization, cluster management, and logging and analytics. The connectors for Spark, Redis, and Solr sit above those middle layers, and more are on the way.
Streamlining Apache Spark deployments should be quite helpful to Riak customers doing in-memory analytics. While Spark lowers the programming bar for doing data science compared to MapReduce on Hadoop (not to mention speeding up the processing significantly), it’s is still somewhat difficult to run in a production setting. Basho recognized the problem and looked for ways it could contribute a solution.
“People who are deploying Spark usually deploy Zookeeper to help with the cluster management,” Coppolo says. “But if you know folks who have deployed Spark, they’ll tell you that Zookeeper is a real pain for them. It has real issues, it falls over, things like that. So we have built-in capabilities within the enterprise version of the Data Platform that that offers that leader-election in a transparent way, so you don’t change Spark in anyway, and so you don’t have to deploy Zookeeper.”
It’s a similar story with Redis, the popular key-value store that serves as a caching layer to speed up read-intensive applications. One of Basho’s enterprise clients, The Weather Company, has dozens of Redis instances to help serve weather data to millions of customers every day, along with Riak KV and several other NoSQL databases, Coppolo says. The company used its own resources to ensure Redis is synced up with those other databases, but it would have preferred not to.
“They’ve told us, ‘If you can make the pain associated with Redis go away, that would be great for us. We put in place our own clustering and logic to populate the cache. We did it because there wasn’t a commercial solution to do it,'” Coppolo says. “So even a customer who applied their own engineering resources to do it would rather not be in that business.”
The Bellevue, Washington software firm already delivered a deeper level of integration with the Solr search engine with the launch of Riak version 2 earlier this year. That integration ensures that any data that’s read into the Riak key-value store is automatically indexed in Solr. That’s now a part of the Big Data Platform; Basho also has plans to incorporate ElasticSearch into the platform in a similar manner.
The Data Platform represents is a new approach for Basho, and there’s more to come with this multi-modal strategy. For example, the company has plans to add another core database to the platform, with the leading options including a graph or columnar store. For customers who need those capabilities today, the company provides integration with leaders in those fields, such as Neo4j.
Today’s launch represents a new chapter in Basho’s book. The company, which has more than 200 enterprise customers, including 30 percent of the Fortune 50, was on the ropes last year before ownership brought in new management. CEO Adam Wray and CTO Dave McCrory are credited with steering the company a new direction and boosting revenues in late 2014 and early 2015 to record levels.
There are many ways to slice the big data pie. The line separating transactional big data (NoSQL) and analytical big data (Hadoop) environments are starting to blur as projects like Spark–which runs just as well in Hadoop as it does in NoSQL–gain steam. With the Data Platform, Basho is taking a page out of the playbook of the Hadoop distributors and proposing a vertically integrated big data architecture that includes multiple pre-bundled and pre-integrated engines for doing different stuff with your data–only instead of basing it around a common file system, it’s based around a full database.
Only time will tell if this strategy plays out. What’s clear at this point is that the amount of technical services and manual intervention required to build and run big data applications today will most certainly go down. Whoever can solve that problem with software ultimately wins.
While Basho maintains its commitment to open source software, most of the key capabilities in the Data Platform are not free and must be licensed from Basho. The enterprise version of the Data Platform (which includes Riak KV and Riak S2) costs $7,500 per node. If you want Data Platform to manage your Redis or Spark clusters, you will pay an extra $1,000 per instance of those products; Solr integration comes free with purchase of the Data Platform.