DataStax Dips Into Graph Waters, Pulls Out a Titan
Growing interest in graph database technology led DataStax to acquire Aurelius, the company behind the open source TitanDB graph database. DataStax tells Datanami that it plans to make the TitanDB technology available as an optional feature running on its commercial NoSQL database.
TitanDB is one of a handful of open source graph databases aimed at enabling people to group and query large sets of connected data. The open source project, which began 2.5 years ago, has garnered a strong following of users, and those users are now looking to put those graph database applications into production.
But instead of building a commercial version of the open source software and then selling technical support licenses against it, the folks at TitanDB decided it was better off signing on with an established NoSQL database vendor. As they evaluated the market, it was clear that DataStax was the best fit.
“We felt Cassandra and by extension DataStax Enterprise to be the most viable technical platforms to build such a graph database on,” says Matthias Broecheler, the managing partner and CEO of Aurelius. “It made a lot of sense to join forces.”
The feeling was mutual on DataStax’s end, says Martin Van Ryswyk, executive vice president of engineering at DataStax. “Over the last year there’s been pretty consistent feedback that they’re interested in graph database technology,” Van Ryswyk says. “When we talked to them and dug a little deeper, they said they’re looking at this thing called TitanDB, because we think it’s the only thing that has the chance of scaling to what we can scale your Cassandra to.”
Apache Cassandra, its commercial cousin DataStax Enterprise, and Apache HBase are technically considered wide-column stores. These types of databases own their evolution to Google‘s BigTable technology, and store data in records that can be billions of columns wide, providing a large degree of availability and scalability.
But with the Aurelius acquisition and other strategic moves–such as support for analytics and the forthcoming support for JSON files–DataStax is evolving past the wide-column store origins and taking on the world of multi-modality. “We’ve already started an evolution here at DataStax of having a multi-model approach, to having one platform where you can serve multiple types of data,” Van Ryswyk says. “This multi-model approach is an evolution for us, and it made a lot of sense to add graph.”
The joining of TitanDB with DataStax Enterprise will be relatively simple due to the fact that TitanDB was built with a storage agnostic model. Apache Cassandra, which forms the core of DataStax Enterprise, is one of the storage engines TitanDB already supports, along with Apache HBase and Oracle BerkeleyDB.
While the TitanDB project will remain open source, do not expect to see Broecheler and his team contribute any new features back to the open source TitanDB project. That’s because that any new features that Broecheler and company develop will be sold as a commercial add-on to DataStax Enterprise called DSE Graph. DataStax will, however, make contributions to Tinkerpot, the query language that TitanDB and other graph databases use.
The companies aim to make the linkage between the two products stronger. “One of the great things about this deal is we, as the graph database guys, can actually talk to Martin now and say, Hey if DSE can do one thing for us, we could make this one functionality in the graph much more efficient,” Broecheler says. “We can make it a two-way street now, and I think that will help us greatly in improving graph technologies.”
The combination of DSE and TitanDB could even help Broecheler and his team accomplish their original goal: Making the process of building and running a massive graph database relatively easy.
“Our vision from day one has been to build a product that you can basically take off the shelf and start a Facebook with,” Broecheler says. “That’s been the vision behind it and we feel with the platform we’re building here that we’re a lot closer to that goal.”
Of course, graphs aren’t new. But the maturation of distributed graph database technology is making it one of the hottest areas of big data analytics at the moment. Several weeks ago, Neo Technology‘s CEO Emil Eifrem predicted in a Datanami interview that graph databases would be ubiquitous within five years. Graph databases, he said, are very good at tackling the sorts of problems that arise when you want to do analytics very quickly on real-time transactional data.
DataStax’s Van Ryswyk agrees. “When we talk to customers, that’s what they’re trying to do,” he says. “Large retailers, healthcare companies, financial services companies–the kinds of problems they’re faced with in this new era of extremely connected data, mobile, and IOT [are about] bringing all these connections and this highly connected data together. The problem is not hundreds or thousands of things—it’s billions of things they need to work together well.”
Of course, DataStax and Neo Technology are now direct competitors. And while Neo’s Eifrem maintains that building a native graph database from the ground up to run graph databases exclusively is the best way to achieve performance, it’s clear that this assertion will be debated in the coming months and years, especially as more NoSQL database vendors take the multi-modal path and add graph database add-ons.
“Obviously Neo4j has been in the market the longest. They’ve done an excellent job educating the market,” Broecheler says. However, he maintains that TitanDB’s distributed approach is a superior one if sheer scalability is the goal. Specifically, Broecheler says all database writes in Neo4j have to go through a single node, which provides an impediment to scalability.
“If you look at the way graph database has been presented to people [it’s presented as] imagine what Facebook or Twitter or LinkedIn or Google is doing with a large graph,” Broecheler says. “But quite honestly there’s no solution on the market right now that can scale anywhere near any of those use cases that are typically used to describe graph database. We want to be the first in the market to say hey if you actually want to build Facebook or Twitter or LinkedIn or Google, you can use this and it will actually scale to their scale, or get close to their scale, without having to dedicate a 100-person engineering team to make that possible.”