OmniSci ‘Hitting Its Stride,’ CEO Says
Aaron Williams, the vice president of community for OmniSci, loves doing live demos. Like a trapeze artist who works without a net, Williams shows no fear as stands on the stage and fires off queries from OmniSci’s GUI client to a massive database with over 10 billion rows of data running in GPU memory. When the responses come back nearly instantly, either one of two things will happen.
“I either will get an audible gasp of people who are like, ‘Wow,’ just because I’m showing 10 billion rows and showing 200 millisecond query times, with no indexing and no aggregation. It doesn’t seem to make sense to people,” Williams says.
“Or I get the guy in the back of the room who goes, ‘I call bulls***,’ which is the best thing that can happen to me, because them I’m like, okay, let’s do this,” the blue-haired Williams says at OmniSci‘s inaugural Converge event this week in Mountain View, California.
The Need for Speed
OmniSci’s raison d’etre is “speed, speed, speed,” Williams says, and the word is starting to get out. Since Todd Mostak and Thomas Graham spun the GPU database out of an MIT CSAIL research project in 2013, the technology has started to gain traction with some of the most demanding users in the world.
Telecommunications companies and federal agencies have been among the most active customers for OmniSci (formerly MapD) in its early stage of growth. Representatives from Verizon Wireless, Charter Communications, and Telus (a Canadian telecom firm) took the stage at Converge to talk about their use of OmniSci’s fast analytic platform.
Verizon, which was one of OmniSci’s first customers, uses OmniSci to analyze telecommunication data in real-time to help maximize service levels for customers. The company’s database, which spans tens of billions of records per week, and lives on NVidia DGX servers, powers dashboards that allow analysts to generate network reliability reports in a matter of seconds, versus 20+ minutes before.
One federal agency that cannot be named pushes the limits of OmniSci even further. “We have a federal cyber use case that’s over 100 billion records, and it’s streaming, which is kind of unheard of,” Mostak tells Datanami. “With their previous status quo, they couldn’t do it.”
Another federal customer is using OmniSci to analyze a massive number of geospatial events. “They were running these queries on a legacy – I wouldn’t call it legacy; it was cutting edge a few years ago – analytics database and it was taking 18 hours to run it. They actually found our open source [database]…and the first time they ran it, it was like half a second. And now they’re targeting 400 times per second. So somebody who had to do in batch once a day they can now do continuous.”
For any given problem, there are multiple ways to attack it. The problems that OmniSci tends to be good at – querying large amounts of spatial-temporal data – have traditionally found their way into traditional column-oriented databases. They’re being targeted by in-memory data grids, in-memory NewSQL databases, stream data processing systems, and Apache Spark, everybody’s favorite in-memory, distributed processing framework.
But these systems aren’t able to do what OmniSci can do, using a relatively straight forward appraoch.
Brute Forcing IT
This was a big week for Databricks, the commercial venture founded by the creators of Spark, which announced a $400 million financing round and a valuation in excess of $6 billion. Mostak appreciates how good Spark is for certain problems. But he also understands that, with its Java underpinnings, Spark will never deliver the performance of a system that runs closer to the iron.
“Spark is way faster than what came before obviously [Hadoop],” he says. “But still, being built on a JVM, you’re going to sacrifice some speed.” (OmniSci was developed on LLVM, the C++-based compiler technology, which allows OmniSci to run on any number of underlying systems.)
Spark seems to be a frequent punching bag for OmniSci these days. The company showed benchmark results showing the relative performance of an analytics workload running on a cloud-based Spark cluster versus OmniSci running on a GPU cluster. Even OmniSci running on a CPU-equipped Macbook (the company is now targeting CPUs too) outperformed the Spark cluster, according to the benchmarks.
“There’s still the big elephant in the room, which is they run these queries and Spark takes almost a day,” he says. “That’s a fundamental bottleneck. And when they use the GPU stuff, they’re like wow. We think we could run this thing in less than a minute on 10 or 15 nodes.”
Sometimes just “brute forcing it,” as OmniSci does with its GPUs (and now CPUs), is a better approach to certain problems, Mostak says. “Just by doing the fast thing, sometimes it’s better than doing the index lookups, especially at scale,” he says.
“For all the platforms that do something special, like index the data — that means a lot of pre-prep,” he continues. “If you want to change your join dimension, you’ve got to index that or re-cube the table. So being able to do it on the fly, the agility piece is just as important as the interactivity piece.”
On the Scene
With $92.1 million in venture funding, about 100 employees, and around 100 on-prem and cloud customers, OmniSci is finding that the combination of a fast database with a rich GUI client is resonating with customers.
The company is by no means finished building its column-oriented SQL database. With version 5.0 announced this week, the company has added support for cohort analysis, new filtering functions, and support for user defined functions. The product doesn’t support all SQL functions, but it’s unlikely it ever will.
“It’s horses for courses,” Mostak says. “We don’t think in the near future we’re going to have full SQL capabilities like SQL Server or Teradata. We’re certainly not going to have all the bells and whistles of Tableau. It’s the 80/20 rule. People don’t need every last mile and feature. They need the basics they need most of the time.”
Thanks to its LLVM underpinnings, OmniSci is relatively agnostic to the hardware and operating systems it runs on. The software supports IBM‘s Power platform, and leverages the extra memory bandwidth that Power’s NVLink can offer. Interest in OmniSci on Power is not great, however, and the new partnership with Intel, which brings support for running on CPUs and in the future will see OmniSci leveraging Optane, looks like it will bear quicker fruit.
“Things just really seem that they’re hitting their stride now,” Mostak says. “It’s taken a while to build [the platform], to get to the point where people could start using it for bigger and more diverse use cases.”