Graph Databases Everywhere by 2020, Says Neo4j Chief
In a market rife with disruptive innovation, perhaps nothing will be as groundbreaking over the next five years as the widespread adoption of graph databases, according to Neo Technology CEO Emil Eifrem.
Just four years since its founding, Neo Technology has risen to the top of the graph database heap, which itself has seen a remarkable amount of growth compared to other database types (see fig. 1). The company’s product, called Neo4j, is arguably the most mature of the graph databases, which are an advanced type of NoSQL databases used for a variety of analytical and transactional tasks.
Today the graph database company announced $20 million Series C round of funding, which Eifrem says validates what the company has already done, and prepares it for what it will do next. While Eifrem is in a great position to boast about Neo4j’s success, the Swedish-born technology executive is more interested in promoting the values of graph databases as a whole. Graphs are about to break out, in a major way, he says.
Graph Break Out
“One of the things that’s paved the way for this round of funding [is] the main stream-ification of graphs, and specifically the traction we’re seeing inside of big enterprise,” Eifrem tells Datanami. “It’s become clear that by the end of this decade, every single Global 2000 company will have at least one if not several graph projects within their company.”
If this sounds surprising to you, you’re not alone. While Hadoop has stolen tech headlines over the past five years, graph databases have lingered a bit on the outskirts of enterprise respectability. Forrester thinks 25 percent of Global 2000 firms will have production graph databases within two years, and Eifrem thinks it will accelerate through the end of the decade.
What’s with the graph surge? While graph theory has been around for hundreds of years, only recently have enterprises started realizing the potential of graph databases, the Neo chief says.
“In some sense, graphs have been around since the 1700s. It’s one of the really old sub-disciplines of mathematics, so it’s been around forever, from my perspective,” Eifrem says. “But only in the last 12 to 18 months have there been off-the-shelf, enterprise-ready graph databases on the market. It’s really been since then that the momentum has really started to take off.”
Eifrem refuses to accept credit for the remarkable rise of graph databases, which is clearly demonstrated in the DB-Engines.com rankings. But obviously, Neo Technology had something to do with it, if only by having a neatly packaged graph database product ready for the budding throngs of emerging graph database developers.
“We’re enjoying the effects of being the first in the market,” Eifrem explains. “I don’t think necessarily that we’re any smarter or better than other people building graph databases. We’ve just been at it for a longer time.”
Where To Graph Now
Graph databases are a general-purpose technology that sit at the junction of analytics and transactions. People may lump graph databases into the analytics bucket because they’re doing stuff in real-time that previously couldn’t be done in real-time on relational systems, according to Eifrem. (For more on Eifrem’s views of analytics versus transactional systems, see How Big Data Tech Is Bridging the Analytical-Transactional Divide.)
However you categorize them, graph databases have a very distinct advantages over other big data technologies when it comes to several specific use cases. “We’re not saying graph database are good for absolutely everything all the time,” Eifrem says. “But when it comes to a couple of distinct use cases, it’s very clear that graph databases are superior.”
Those classic use cases include things like recommendation engines. Because of the way graph databases maintain relationships among the data stored in the graph, it can very quickly determine how things relate to one another. Wal-Mart uses Neo4j to analyze the products that people buy, determine how those people and products relate to other people and products, and make recommendations based on those observations.
Graph-based search is another classic use case where graphs have an inherent advantage over other technologies, Eifrem says. Cisco’s technical support business uses Neo4j to boost the quality of the results for searches it returns to users. It’s largely a recreation of the PageRank algorithm that Google introduced more than 10 years to improve the quality of Web searches on the public Internet. By identifying the most frequently cited pages in its forums and email threads, Cisco can more quickly surface the best answer to questions about routers or other equipment that customers have questions about.
Because it’s an open source software company that gives away free downloads of its software, Neo Technology doesn’t know exactly who’s using Neo4j. There have been half a million downloads of Neo4j 2.0 since January of last year, Eifrem says today on his blog, but how many of those are actually used is unknown.
But the company does have 150 paying customers, including big names like Wal-Mart, E-Bay, Cisco, UBS, EarthLink, Telenor, CenturyLink, and Pitney Bowes. The privately held company does not share sales figures, but says revenues have doubled or tripled for the past three years. 2014 was a pivotal year for Neo, Eifrem says, and paves the way for graph databases and Neo4j to go mainstream in 2015 and 2016.
With $20 million more in the bank, Eifrem intends to scale the company up, with investments in the development side of the house–which is mostly handled through its Sweden office–and in its sales and marketing organization, which is handled through its headquarters in the San Francisco satellite city of San Mateo.
You can expect Neo Technology beat the drum for graph database a little louder in the coming months and years. “Most people in the world haven’t yet heard of graph database and how useful they are in the enterprise,” Eifrem says. “Now is the time to get the word out about this.”
Eifrem welcomes competition for graph databases and takes a rising-tide-lifts-all-boats approach to Neo’s success. His aim is to grow the overall pie for graph databases, which he considers a very general technology that can be used widely across the board. That’s not a bad way to look at it, especially when you consider that DB-Engines diagram (see figure 1).
Neo Technology is the undisputed king of the mountain when it comes to graph databases, but it will certainly see more competition in the future. Eifrem says that attention is a double-edged sword.
On the one hand, there are a number of smaller NoSQL database startups that are looking to bolt graph engines on top of their existing storage infrastructures. These include FoundationDB and Basho Technologies, the company behind the Riak database. According to Eifrem, those approaches won’t deliver the performance of a native solution.
“If you want to really get performance at scale when it comes to graph data, you need to have a native solution,” he says. “Ultimately the architecture where you basically put the graph layer on top of an existing non-graph storage solution just does not scale. That’s actually how we started out, what we first tried back in 2000. We tried to write a layer on top of Postgres at the time. And it just won’t get you anywhere close to the same level of performance as you get with a native solution.”
The same goes for trying to bolt a graph database atop Hadoop, a la the Giraph project and others. “We just found that whenever you try to get graph data and put it in non-graph format, you’re just going to lose so much performance and scalability that it’s not worth it,” he says. “Really the beauty of graph database comes from having fast and predictable performance when it comes to pattern matching and traversals over large data sets. And really the only way to get to that is by using a native graph architecture. HDFS is fantastic at many things, but it’s not a native graph storage system.”
What keeps Eifrem awake at night is the rumblings of giants in Redmond, Redwood City, Armonk, and Walldorf. “We know of several [graph database] projects going on at these companies,” he says. “They will be rolling out graph database offerings. There’s no doubt about it.”
Eifrem welcomes the added attention that Microsoft, Oracle, SAP, and IBM can bring the world of graph databases–up to a point, anyway.
“It’s going to help get the word out about how useful graph databases are and I’m actually looking forward to that,” he says. “What I’m not looking forward to is the FUD [fear, uncertainty, and doubt] cycle. I’m sure when we get to the point where, before they start releasing their own graph database, they’re going to be starting FUD about it, and that part I’m less excited about. But it’s probably as inevitable as them releasing their graph database product.”