Hyperscale Analytics Growing Faster Than Expected, Ocient Says
After spending more than five years and close to a hundred million dollars to rewrite the innards of an analytics database around the superfast I/O of NVMe drives, Ocient enjoyed better-than-expected 2022 results, the company announced last month. Now the company is looking to ramp up sales of its database to the 1,000 or so global organizations that have true hyperscale needs.
“We almost tripled our orders, which is pretty amazing,” Ocient CEO and founder Chris Gladwin said of the 177% increase in bookings the company recorded in 2022 over the previous year. “That was exceeding our plan.”
Gladwin’s plan may need some tweaking.
Ocient’s story starts back in 2016, a year after Gladwin sold his previous company, object storage provider Cleversafe, to IBM for $1.5 billion. At the time, solid-state NVMe drives were just starting to creep into the enterprise. Gladwin was curious to see what it would take to capture the enormous increase in I/O throughput from NVMe drives in a database, where it can be exploited to tackle analytics challenges beyond the capability of existing products.
“The cost per million IOPS, a million I/O operations per second, [for NVMe] is 1,000 times better than a spinning disk,” Gladwin said. “It’s like, you can’t touch this thing. So the whole key is how do you use that thing, and get every last ounce of performance out of it.”
The idea of hooking a databases up to fast NVMe drives isn’t new. Lots of vendors and developers have tried it. However, without rewriting the innards of the database basically from scratch, it won’t leverage the enormous I/O potential of those NVMe drives, Gladwin said. It comes down to basic math and physics.
“Spinning disk physically wants you to give them one thing to do at a time, because the read/write head is in one place, and that’s just how it is,” he told Datanami. “NVMe drives today want 256 tasks in parallel. The next generation is 500. The next generation [after that] is 1,000 per drive. It’s on that track. So you’re going see just these mind-boggling numbers from the number of parallel tasks in flight.”
Database developers have managed to tweak the I/O stack to handle 10 parallel tasks at a time, but the idea of getting hundreds or thousands of parallel tasks per drive with the old architecture is just not within the technical realm of possibility, he says.
“They did mind-bending technical gymnastics to get good performance, in spite of the fact that physically when you get to the drive, it’s one thing at a time,” Gladwin said. “So basically you’re going to have to rewrite the whole I/O layer.”
Once you start pulling one string in the database, pretty soon the whole sweater is sitting on the ground. For Ocient, Gladwin’s team started with one part of the database–the I/O layer–but it quickly moved on to others.
“The I/O layer in the database is like 40% of the code,” he says. “Alright, well, if you do that, now you’ve got to rewrite your optimizer. The optimizer inside the database is another 40% of the code.”
You’ve now rewritten 80% of your database engine, but why stop there?
“While you’re at it, you probably got to go down and tweak those memory allocators,” Gladwin says. “Well, that’s another 10%. That’s the math.”
Gladwin’s team spent five years building a new I/O layer, a new optimizer, and new memory allocators. It even did some work in Assembler to tweak the NVMe drives to work with the new Ocient database, he said. What started as a research product quickly became an expensive development project with no guarantee of a pay off.
“We had a big, giant, expensive dev team and no revenue for about five years,” Gladwin said. “So that was a little stressful.”
From the looks of it, some of that stress is melting off as Ocient clusters heat up. While the Chicago-based company wouldn’t share specifics, Gladwin sounded as if the company was getting over a critical hump as customer systems start to go live, real analytics work gets done, and revenues start to come in.
Gladwin will be the first to tell you that Ocient’s data warehouse is not for everybody. He has studied the market extensively, and has concluded that there are only about 1,000 organizations around the world that have the need for the type of big and sustained analytics throughput that Ocient can deliver.
“It’s only 10% of the $200 billion [global analytics market], but it’s a $20 billion market, which is different,” he said. “Their active data set is half a petabyte or more, and the complexity of the queries isn’t just some simple lookup–it’s a query analysis on average that’s going to make 500 CPU cores busy.”
Hyperscale analytics use cases can be found in different industries. Telecom companies have them in spades, thanks to the huge amount of metadata generated by digital communications. Trillions of dollars have been spent on the 5G rollout, so telecoms spend a little extra to ensure their 5G signals cover the areas they want. Connected automobiles generate 55MB of data per day, and analyzing all that data requires enormous and sustained computational horsepower.
Ocient has three paying companies in ad-tech, and is in talks with two more, Gladwin said. “There’s 10 million digital auctions every second, and if you want to back-test your new campaign forecast algorithm on the last three months of data times 10 million a second–okay that’s hyper scale,” he said.
There are also the government customers that have come calling that Ocient can’t talk about. Needless to say, In-Q-Tel was a participant in the $15 million extension of the initial March 2018 round of funding that Ocient announced in June 2020.
All Ocient installations to-date have occurred in the cloud; AWS was the company’s first cloud partner, and GCP is its second, with more to come. Every cloud installation involves a cluster of bare-metal servers, each with its own NVMe drives. While it runs exclusively in the cloud today (one-third to one-half will be on prem in the future, Gladwin said), the company’s architecture eschews the separation of compute and storage that is so popular today.
“When you get to that kind of size, the price-performance is going be a real challenge in other data warehouses, because you’re trying to pull that across the network and you’re just not going to have a lot of bandwidth,” he said. “In our case, compute and storage are in the same box, and there’s lots of those boxes. But they’re in the same box, in between a dedicated CPU and an NVMe solid-state drive, which is our storage tier.”
No Ethernet connections are used in an Ocient cluster between the compute and storage layers, although obviously the analytics and machine learning jobs themselves are submitted over the network.
“We don’t have a network connection between the compute and storage tiers,” Gladwin said. “We have multiple parallel PCI4 lanes. When we benchmarked with Snowflake and Redshift, they were in the biggest cluster you can configure, and they were getting 16 gigabits per second between their computing and storage tier across the cluster. We get 6,720.”
While customers are enjoying the speed that Ocient provides on their massive data sets, Gladwin is trying to manage the growth of the company. The company has more than tripled the size of its workforce over the past three years, with most of the new hires working remote. Growing bookings by 171% in 2022 was great, but Gladwin was shooting for 100%.
“The faster the growth in a company like this, the more cash it takes,” he said. “What we’re essentially trying to do is to optimize efficiency, where we have high growth, and doubling [of revenues] and then 80%, which is plenty fast. But to go faster than that, it would become too capital inefficient.”
There are no optimizers for that.
Ocient Report Chronicles the Rise of Hyperscale Data
Ocient Emerges with NVMe-Powered Exascale Data Warehouse
The Network is the New Storage Bottleneck
Editor’s note: This article was corrected. The company has spent close to $100 million, not more than $100 million. Orders increased by 177%, not revenue. And it has three paying ad-tech customers, not two. Datanami regrets the errors.