Hadoop-based RDBMS Now Available from Splice
Splice Machine today announced the commercial availability of its relational database management system (RDBMS) for Hadoop. By building a SQL-compliant RDBMS atop HBase, Splice Machine is giving customers another place to run workloads that are reaching the limits of standard databases running on commodity hardware.
When you think about Hadoop, chances are good that you think about analytic workloads running against large amounts of semi-structured data. It’s for doing data mining against Web logs or running pattern matching algorithms against customer data or analyzing customer sentiment from the Twitter fire hose. You want online transactional processing (OLTP)? Then get yourself a regular database, not a fancy file system.
That’s the conventional thinking, anyway. But the folks at Splice Machine are taking the road less traveled by building an ASNI SQL 99-compliant RDBMS that lives atop Hadoop. It may seem like an odd place to build a database. But it actually makes sense when you consider the massive investment that is currently going into Hadoop, says Splice Machine’s CEO Monte Zweben.
“We have the Hadoop layer and Hbase underneath our platform and get to leverage the tremendous development community around the world, which is contributing and hardening Hadoop for the enterprise,” Zweben tells Datanami. “The open source community is building all the ancillary tools around Hadoop for transferring and transforming data like Sqoop and Kafka and Storm, or the machine learning packages like Mahout–all that development is being done by the open source community and we leverage that.”
In other words, Splice Machine gets to reap the benefits of the hard work the open source community puts into Apache Hadoop–ride the coattails, if you will–without having to make the investments or do the hard work itself. That gives it a leg up on the NoSQL and NewSQL database players, who are trying to solve the same problem that Splice Machine is—namely, providing the database to power big data applications that exceed the capabilities of standard database packages like Oracle, SQL Server, DB2, MySQL, and Sybase.
“The NewSQL guys have to build their own storage layer–their own key value store, their own distributed architecture, their own high availability architecture,” Zweben says. “This is a hefty condition and requires a great deal of time to iron out all the kinks. We get to leverage the fact that Hadoop and HBase are maturing vastly and quickly.”
Zweben identified two main types of customers during the Splice Machine private beta test period (the product is now in public beta, but that’s not stopping the company from selling support against it.) The first type of customer has a “hair on fire” database problem that can only be solved by adopting the “scale-up” approach on expensive proprietary hardware ( i.e. running Oracle on Exadata appliances or DB2 on Pure Systems) or the “scale-out” approach using NoSQL or NewSQL databases.
“The other kind of customer is customer who has experimented with Hadoop,” Zweben continues. “Maybe they bought a cluster and started doing some data mining or analysis. But maybe this data is sitting in a parking lot . It’s just sitting there and they’re not getting a lot of use out it. That’s where we can say, ‘You invested Hadoop. Now you can really power applications and get the value out of Hadoop by making it real time.’ We can unlock the data that’s sitting in that parking lot of Hadoop.”
The market research firm Harte Hanks, an early Splice Machine adopter, had the first kind of problem. The $560-million company relied on the Unica marketing automation application (now owned by IBM) running atop an Oracle database to help it track leads and analyze opportunities. In addition to these components, it relied on Cognos BI tools and an ETL tool.
Harte Hanks wanted to boost the performance of Unica, but was having difficulty bringing all the pieces together at a price point that worked. After loading Unica onto Splice Machine and Hadoop, it found a possible solution, without a costly migration. “We are delighted with our initial results, which show queries executing several times faster with greater flexibility and efficiency,” Harte Hanks managing director of product innovation Rob Fuller says in a press release.
Zweben is confident that Splice Machine is onto something with his RDBMS on Hadoop story. And so are the company’s investors. Since its initial round of financing, the company has completed a Series B round, which netted a total of $18 million earlier this year. The company is looking to grow rapidly as the version 1 product is released (hopefully) later this year, enabling the company to recognize deferred revenues. Headcount is expected to grow from 30 now to about 80 by the end of the year, Zweben says.
Co-habituating analytic and transactional applications on the same Hadoop cluster makes sense in some ways–especially if both types of apps need access to the same giant pool of data, and it is problematic to move that data. In his Hadoop Summit keynote last year, Hadoop co-creator Doug Cutting said there’s no reason Hadoop couldn’t do OLTP workloads.
Zweben sees three use cases where blending OLTP and online analytical processing (OLAP) workloads on the same system makes sense. The first is digital marketing and advertising technology, such as campaign management, email personalization, and ecommerce personalization. “These are the kinds of apps that people have tremendous amount of first and third party data on their consumer and they want to power apps directly with that data,” he says.
“That’s a wonderful area for the Splice machine architecture because you can do very long running queries that help with audience segmentation and audience man agent and at the same time be real-time and transactional in dealing with the conversion and responses that come from the campaigns, the clicks that happen,” he says.
Doing complex event processing with senor data originating from the Internet of Things is another potential use case. “Being able to do health monitoring applications on large-scale networks of devices, or set top boxes in the telecommunications world and so forth,” he says. Lastly, the proliferation of digital health records from EMR and EHR systems, in combination with genome sequencing data and demographic data, is providing new opportunities to develop personalized medicine in the life sciences market.
Splice Machine is available for download now. The software is free to download and test on anything from a single desktop to a massive cluster (you must contact the company to receive the clustered version). If a Splice Machine database is put into production, an enterprise subscription is required. The list price is $5,000 per node.