Follow Datanami:
March 16, 2015

Qumulo Comes Out of Stealth with ‘Data-Aware’ Storage

For the past three years, a Seattle company has been working to solve what its founders consider the biggest problem affecting large-scale storage: Actually knowing what data you have and how it’s being used. Today, that company, called Qumulu, emerged from stealth with a product ready to sell.

In 2012, Qumulo set out on its mission to solve storage by conducting more than 600 interviews with large-scale storage users—outfits like Hollywood special effects studios, genomic sequencers, and oil and gas exploration companies. With $67 million in venture funding, the company—which was founded by the creators of the object-storage company Isilon, which itself was acquired by EMC in 2010 for $2.5 billion–could afford to take its time.

“We came to realize that data administration, more than anything else, is all about questions,” says Qumulo co-founder and CEO Peter Godman. “What do I actually have? What is growing? Who’s using the data—what applications and people? How are they sharing the data? What stuff could I archive? What stuff do I need to back up?”

Answering those questions was fairly easy when the data volumes were in the 1 million to 10 million range. “But they became difficult to impossible in the 10 billion 100 billion landscape,” Godman tells Datanami. “Increasingly the entire industry is getting to that point, where we’re managing billions of digital assets, but we increasingly have no idea what we actually have or how it’s getting used.”qumulo

Qumulo thinks it has taken the first steps to solving that problem with Qumulo Core, a new software-defined storage system that has built-in analytics to answer the dreaded questions that nag on storage admins’ brains.

Qumulo designed its file system, called the Qumulo Scalable File System (QSFS), into the product, which is available pre-installed on Qumulo’s Flash-equipped storage arrays or can be licensed as a software-only offering.

What makes QSFS (which exposes an NFS-compliant API) unique is that it brings its own metadata database for tracking information about the storage. “Rather than being the dumb tree structure, it’s actually a tree structure plus a database of metadata that allows you to answer complicated queries about data footprints, and do so very rapidly,” Godman says.

The metadata will make life much easier for storage administrators who are wondering what’s happening to their environments. “Say I leave on Friday afternoon, and my storage is at 67 percent,” Godman says. “When I come back Monday morning, it’s at 92 percent. I know in next eight hours I’ll be out of capacity, but my storage has nothing to say about where it’s going. I actually have to go out on periphery of storage environment—all the client machines and render nodes or HPC nodes or whatever they are–in order to figure out what’s consuming this resource.”

qumulo_1A similar problem exists in performance, where something suddenly gobbles up storage I/O capacity, but the actual storage environment provides no insight into the problem, Godman says.

We’ve come a long ways from the 1990s, when scaling storage was the biggest problem. The advent of object-storage systems in the 2000s, such as those from Isilon, Amazon S3, Cleversafe, and Amplidata, brought us further along.

“[Object-based storage] made it so you can put all your data one giant bucket, and scale that bucket out indefinitely,” he says. “It took the piece out of the problem of scalable storage administration, but it didn’t do anything at all about data administration.”

Godman sees building the analytics directly into the object-storage system as the answer. It’s more efficient than building your own storage scanners, or buying a third-party product to pull the insights out.

Currently, Qumulo Core only provides monitoring. It gives administrators the real-time information they need to make decisions, but administrators are still on their own when it comes to taking actions to stop the problem, such as cutting off an IP address or killing a runaway job. In the future, you can expect to see automated actions being added to the Qumulo interface, Godman says.

While it’s been in stealth for three years, Qumulo has actually been shipping product since August and has 15 customers today, including Sinclair, Antfarm, Telus Studioes, a supermajor oil and gas company, and three of the top five file animation studios. The Qumulo core starts at about $50,000.

Related Items:

Software-Defined Storage Takes Off As Big Data Gets Bigger

How to Move 80PB Without Downtime

Datanami