NoSQL Database Scales to New Heights
Distributed NoSQL databases were designed to scale beyond what relational databases can do. That’s nothing new. But when NoSQL database vendor FoundationDB today announced that latest version 3 database was clocked handling almost 15 million random writes per second, it makes you wonder just how much scalability we might need.
The first two releases of the FoundationDB database, which is based on a key-value store engine, maxed out at about 400,000 random writes per second, according to FoundationDB co-founder and CEO Dave Rosenthal. But by overhauling the core of the database and parallelizing as much as it could, the Virginia startup was able to coax a 25x increase out of the database’s random write performance.
“This represents not just a huge improvement for us but a huge step forward over what other distributed databases are talking about,” he tells Datanami. “It’s a really exciting time for us. We get to show the market that there really isn’t the tradeoff they thought there was.”
The tradeoff he’s referring to is about exchanging transactional consistency for speed and scalability. Rosenthal says the ding against transactional databases–i.e.those that adhere to the ACID precepts of atomicity, consistency, isolation, and durability–historically has been that you can’t make them as fast as non-transactional databases.
‘But by ringing up 11 million random writes per second on a 32-node cluster on the Amazon Web Services cloud–and then following that up last week with a run measured at 14.4 million random writes per second, or 14.4 MHz as Rosenthal says on his blog–FoundationDB gave itself something to cheer about.
The announcement comes on the heels of several other NoSQL database benchmarks, including one from Netflix running Cassandra, one from Google running Cassandra, and another from Aerospike’s database running on Google. All three of these demonstrated a sustained capacity for 1 million random writes per second.
Rosenthal was excited to announce that FoundationDB beat those benchmarks with a transactional, ACID-compliant database. “The key property we’re talking about is the ability to do multi-key ACID transactions: Isolation and atomicity of multiple operations,” he explains. “This is what ACID has meant for three decades in SQL database space. But there’s a lot of confusion around the term now in the NoSQL space.”
FoundationDB had to re-write the guts of the database to get it to scale. “What we’ve done in 3.0 is sort of like a heart and lung transplant for the database,” he says. “We’ve re-written three of the four major components in the database and built them with a new scalable architecture.”
In the first two releases of the database, all the transactions had to flow through a single server, which put a serious damper in the parallelism the product was supposed to achieve and hurt scalability. With version 3.0, FoundationDB eliminated that bottleneck, Rosenthal says. “It has a fully parallel data path. It’s hard to accomplish that while maintaining all the strong ACID guarantees. So that’s what we accomplished for 3.0.”
The company also significantly re-worked Flow, a programming extension to C++ the company developed to implement “actor-based concurrency,” which is critical for achieving the massive parallelism FoundationDB sought. The enhancements in Flow version 2 were critical for achieving the gains realized in the key-value store, Rosenthal says.
Despite all the changes, the FoundationDB API remains the same, enabling users to get the scalability benefits without a lot of additional work. What’s more, in addition to the 25x improvement in throughput, latency has dropped by between 4x and 10x, Rosenthal says.
You don’t have to be a key-value wizard to get the benefits of this kind of scalability and latency. The company has already developed two layer that sits atop the key-value store, including a SQL layer and a graph database. It’s expected that the company could be developing a document database layer next.
There’s a bit of room to grow with FoundationDB 3.0. The Amazon cluster ran on some big machines, and was the equivalent of about 500 nodes. “That’s on the high end for how you want to run FoundationDB in the real world,” Rosenthal says. “I think we can probably push up to 1,000 nodes. But beyond that it’s a bit unsteady and it would be a little bit of the Wild West.”
Few organizations today can actually use 15 million random writes per second. But Rosenthal sees many more on the horizon, especially when the Internet of Things (IoT) trend heats up and many more billion smart devices are hooked into the network. “We absolutely are going to continue walking our way up the curve. One hundred million writes per second–there are people who are demanding workloads like that,” he says.
“One of the really ingesting things about IoT is it’s really applicable to this random write benchmark. The way the math works on this, you just say ‘How many devices are there in the world, and how many data points are they sending in, and how often are they sending them in?’ If you just do the math, you come up with astronomical numbers.”
The new benchmark, which you can read about at FoundationDB’s blog, could give an astronomical boost to the company itself. Founded 4.5 years ago, FoundationDB is growing at a good 2x annual clip at the moment, enough to force the firm’s 37 employees to relocate to a new headquarters in 2015.