Follow Datanami:
November 15, 2021

Hammerspace Hits the Market with Global Parallel File System

(whiteMocca/Shutterstock)

Since he left Fusion-io eight years ago, David Flynn has been working to solve the broken relationship between data and storage. That effort gave rise to Hammerspace and its Global Data Environment, which delivers a single global namespace for users and applications to access files, as well as a metadata-driven management layer that eliminates the need to copy, move, and manage data as it sits in different silos.

As CEO and co-founder of Fusion-io (acquired by SanDisk in 2014 for $1.1 billion), Flynn played a leading role in unlocking the power of solid-state storage for a new class of customers. Storing data on fast on NVMe drives right next to compute helped to obliterate the storage I/O bottleneck. But it inadvertently helped to create another bottleneck: the rapid proliferation of silos of data that are tightly coupled to the storage infrastructure on which it lives.

“Basically, at a very foundational level, the relationship between data and the storage infrastructure is actually quite broken,” Flynn said. “Data is not real. Data is a mirage that’s presented by the storage infrastructure.”

This mirage forces enterprises with large data sets and large, dispersed teams to make tradeoffs in how they can access that data. While object storage systems like S3 have basically infinite scalability, Flynn calls it a “cop out” because it introduces latencies that are unacceptable while also pushing data management into the application. That leaves enterprises with a mish-mash of object, file, and block storage systems that don’t meet needs and requires constant manual intervention by storage administrators to keep it fresh and relevent.

“I became keenly aware of this at Fusion-io, where introducing what has become NVMe flash and server-local flash having such extreme high performance gave a really good reason to want to… co-locate [data] down on the servers that are using it,” said Flynn, who is the CEO and co-founder of Hammerspace. “But that just drove to the point of absurdity this challenge of the fact that data is an emergent property of the infrastructure, and to have tiny little silos in every server would kill you.”

So he set out to rethink the fundamental relationship between data and storage, and he came to the conclusion that the answer to managing a diverse and dispersed storage infrastructure is metadata-driven management layer atop a parallel file sytsem.

“It’s folly to think you can build a big enough, one-size-fits-all silo and stick the data in it,” Flynn told Datanami. “We have to ultimately say data needs to transcend the storage system that hold it. And the answer to doing that is to decouple metadata from data and empower the system to be viewed through the metadata and have the metadata manage the data across the infrastructure.”

This is essentially what Flynn and his team have created at Hammerspace. The company has developed a parallel file system, based on NFS version 4.2, that presents a single global namespace that allows enterprises to store data across multiple datacenters in an active-active, eventually consistent fashion. On top of that, Hammerspace presents a metadata-driven management layer designed to significantly reduces the burden on human administrators to ensure data is pre-positioned where users can get it (as well as handle the snapshots, versioning, and protection of data).

What this means is that users can access data anywhere in the world through a single global namespace, even if the data was originally stored on the other side of the world. “That means that your data is, for the first time, omnipresent across each of these and universally accessible, with all of the conveniences of high performance, parallel file access,” Flynn said. “That’s what’s revolutionary.”

Hammerspace isn’t the first distributed file system to teach NFS or SMB new tricks. In fact, the company’s parallel file system rides atop the parallelism built into NFS 4.2. One of Hammerspace’s co-founders is CTO Trond Myklebust, who is Linux Torvald’s handpicked maintainer and lead developer for the Linux kernel NFS client. So anybody adopting that new file system, such as via RHEL 8, can get that benefit.

It’s the coupling of the parallel file system with the metadata-driven management layer that really sets Hammerspace apart.

“This is really two pieces. One is a parallel file system,” Flynn explained. “If you’re familiar with the high performance computing [HPC] industry and what they do there to be able to reach massive scale–those are proprietary and exotic file systems that don’t have the enterprise reliability and feature set. What we have done is we’ve taken NFS and the seeds of parallel NFS and enhanced that so that NFS itself can be a true parallel file system and still and support the enterprise data services snapshots, clones the RAS [reliability, availability, scalability], the enterprise capability.”

It’s the second component—the metadata-driven management layer–that really enables enterprises to address the fragmented nature of their data storage infrastructure, Flynn says.

“Because now your data is presented through its metadata, your data is automatically managed and orchestrated through the metadata, and therefore it decouples the data entirely from the universal storage infrastructure,” Flynn said. “People are very used to managing the data manually by setting up, by copying it themselves, or by setting up links and tools to copy it. We’re moving to a declarative model where they manage data through its metadata.”

The company’s file system has been generally available for several months and is in production at some of the largest telecom and gaming companies in the world, says Molly Presley, who recently joined the company as its SVP of marketing.

“Essentially what we’re really focusing on….are the industries that have the need for large scale datasets,” Presley said, “whether that’s a few hundred terabytes to petabytes or hundreds of petabytes; a global workforce, whether they’re full-time employees or their data users are contractors; and where their infrastructure is global.”

Instead of moving large amounts of data from an on-prem cluster into the cloud to enable users to access it and process it, Hammerspace enables companies to store data one time in its global namespace, and then enable users on AWS, Google Cloud, or Microsoft Azure to access it by spinning up a Hammerspace environment on their private cloud instances, and then simply mounting their application to the global file system.

“Think about all of these cases where you’re moving data. Maybe it’s from the GPFS environment up into the cloud,” Presley said. “We don’t have to make multiple copies of the data. The users are interacting with it at the global namespace level, so you don’t have that added cost of two, three, or four copies of these large datasets, which is unmanageable and expensive. That’s part of what the data orchestration capabilities provide, is not just moving data around, but doing it in an efficient way, being able to warm the cache, in essence, for the application or user who’s using the data. Make sure you don’t have multiple copies because the user and the application are interacting with our namespace, not through discrete storage systems.”

The metadata-driven management layer allows Hammerspace users to customize how their data is distributed. For example, users can set up a rule that says certain pieces of data should be pre-positioned in the cloud, and the file system will automatically move that data to the cloud in the background. The rule can be defined in several ways, such as when it was last accessed, or any data that has been flagged.

“We have both reactionary and on-demand. You can simply mount the file system and start accessing stuff and it will create instances of those files in the cloud. Think of it is like caching it in the cloud,” Flynn said. “If you take the time to describe what subset of data is going to be needed in advance, the system can pre orchestrate it to already be resident.”

In either case, the file system is moving data behind the scenes, eliminating the need for administrators to do that work. That is a paradigm shift in how data is managed, Flynn says.

“Instead of managing in the imperative through manual techniques, we’re introducing for the first time the language in metadata for it to be self-descriptive, for the data to be in charge, to say what it needs in terms of its own very existence,” he said. “It’s like picking up the data by its own bootstraps and now you’re managing data across the diversity of infrastructure. This is a very big paradigm shift and I would liken it to managing servers in the virtual versus managing servers by racking and stacking appliances.”

Flynn says customers can get a big upgrade in data accessibility and performance by layering Hammerspace atop their existing fleets of NAS devices, including NetApp, EMC Isilon, and Qumulo appliances, “or anything that speaks NFS V3.” “That earth-shattering,” he said. “That’s never been done before that you can use your pool of NAS systems or even servers–our product can take servers with local flash and disk–and turn those into storage nodes, and then you can scale-out across them.”

In addition to delivering HPC-like file system capabilities on Linux operating systems that support NFS 4.2, Hammerspace also provides legacy support for NFS version 3 (as mentioned above). The file system also supports the Windows’ native file system, SMB, which accounts for approximately 60% of its customers, Flynn said. It also supports S3 and block storage (i.e. SAN). Just about the only thing it doesn’t support is iSCSI and FC SCSI. “That’s rearview mirror. That’s looking in the past,” Flynn said.

Today, Hammerspace announced the availability of its Global Data Environment, as well as several executive hires. This is first time the company has marketed the file system as a single global namespace as opposed to a collection of point solutions, Presley said.

In addition to hiring the Presley, a veteran of the enterprise and HPC storage space, Hammerspace announced the hiring of Jim Choumas to be VP of channel sales and Chris Bowen to be SVP of global sales. “We’re really pleased to have these very skilled senior executives joining the team,” Flynn said. “It shows I think the effectiveness of the team because these are folks that really know the industry well.”

Related Items:

The Future of Computing is Distributed

Roadmap to Distributed Data Stewardship

Blurred Storage Lines: Clouds That Appear Like On-Prem

Datanami