IBM Seeks to Simplify Graph with New Titan Service
IBM today took the wraps off a new cloud-based graph service based on open source Titan and Apache TinkerPop technologies. Called IBM Graph, the new service is suitable for production graph workloads of any size. But IBM is also focusing on making graph technology easy to use and lowering the barrier of entry into the graph world.
IBM (NYSE: IBM) has been working on the graph service for almost a year, and has had a beta of IBM Graph available since February. As the company researched how people interact with the technology, they received some useful feedback, according to Chris Glew, the senior product manager for cloud data services at IBM’s Cloudant business.
IBM hopes to smooth the climb up graph mountain by providing IBM Graph users with a slew of features aimed at helping them get a basic grasp of how the technology works, and what it can do for them. The content is aimed primarily at developers, who are the main targets of this new offering.
“What we did on the front-end side was to provide an interactive, tutorial playground to help developers understand Gremlin [TinkerPop’s query language] and graph technology at the same time,” says Bhavika Shah, design lead for IBM Graph. “It’s a huge challenge to learn Gremlin as a new language, as a developer who maybe understands graph but they haven’t really interacted with it or tried it out.”
IBM is using the phrase “zero to graph in five minutes” to show how easy the service is to use, and to keep prospective users from feeling intimidated by the technology.
“Users can sign on, play with sample data sets and queries almost immediately,” Shah tells Datanami. “They can see what the Gremlin language is when they write the queues, but then they get the response back in JSON. That’s the big learning interaction that we found was very helpful–to get something back that they can understand, so they’re not completely in this new field without any feeling of where they’re going or understanding how to get to the next step.”
While IBM is taking steps on the front-end to make graph technology more accessible to the average Web developer, it’s also worked to solidify the back-end and to make the service reliable enough to be used for production workloads. The company foresees the service being used to complement existing applications, via API calls, with the types of capabilities that graph databases are inherently good at, such as product recommendations and fraud detection.
“We want to make sure that what we roll out isn’t only for people who know the secret handshake around the graph community, but at the same time, we’re not looking to deliver the Fisher Price version that has no value for the people who have been working in graph to date,” Glew says. “We’ve been very deliberate about making sure we can meet both of those needs.”
IBM Graph is available now on the Bluemix cloud. Customers can get up to 500 MB of graph database storage for free, and make up to 25,000 API calls per month for free. IBM charges $15 per month for each additional GB of storage, and 20 cents per month for each block of 1,000 API calls.
The new graph service is a multi-tenant offering with 24/7 service level agreements. The service is currently being hosted in IBM data centers in Dallas, Texas, and London, England, with plans to expand it to Sydney, Australia in the future, Glew says.
IBM is running the plain vanilla version of Titan and Apache TinkerPop. As the underlying graph database and execution engine, Titan is available on GitHub, while TinkerPop was recently made an Apache project by its creators at Aurelius, which was acquired in 2015 by Datastax, the company behind the Apache Cassandra database.
There are no plans at IBM to offer a standalone graph product that customers can run on their servers, Glew says. But since the technologies behind it are open source, there’s nothing stopping customers from moving their IBM Graph workloads onto their own servers. “If you want to run this on premise, I can send you the links to TinkerPop and Titan to replicate the whole stack,” he says.
The company is adamant about not forking the core open source technologies. That’s something that Datastax did with DataStax Enterprise Graph, when it modified Titan to eliminate the code that allowed customers to plug-in their choice of persistent data store, which gives DataStax Enterprise Graph a performance edge over plain vanilla Titan, while also reducing complexity.
Like Datastax, IBM also selected Cassandra to provide the persistency behind IBM Graph (Titan also supports HBase and BerkeleyDB), while it chose ElasticSearch to provide indexing. Since IBM is hosting the graph database, managing the additional complexity that comes with pluggable back-ends shouldn’t be an issue.
Performance also shouldn’t be a big issue, Glew says. “We’ve run scaling test against the back-end, but we haven’t run till it tips over,” he says. “The fun part of being IBM now is we have resources that make it a lot harder to get to the theoretical limits of a technology that scales like Cassandra.”
IBM Graph is suitable for both analytic and transactional workloads, but the sweet spot is transactional processing, Glew says. The idea is to provide a secondary operational data store to serve graph queries in real time from other production apps. That doesn’t preclude analytic type use cases, but Glew says data scientists would be more likely to use something like a Python notebook to serve analytic needs.
You can expect IBM to evolve this offering in the future to suit data analytic needs, however. Since IBM already has Apache Spark running as a service, one could conceive of a future where data resident in IBM’s cloud-based Titan database is available to serve Gremlin queries and at the same time serve Apache Spark GraphFrame queries too. That’s come up in discussions, but is not yet available.
In the meantime, there’s yet more work for IBM to do to make graph databases easier to use, particularly around data modeling and ETL.
“What’s next for us is we want to make it easier to load the data and model it,” Glew says. “The modeling decisions have to be done as a prerequisite for loading. You can’t throw in a bunch of elements in and then go figure out the relationships [later]. That’s one of ways that graph database don’t fit into NoSQL space….Graphs are highly relational and highly structured.”
For more information on IBM Graph, see www.ibm.com/analytics/us/en/technology/cloud-data-services/graph/.