Follow Datanami:
November 18, 2011

SC11 Video Feature: Garth Gibson on RAID, Roots and Reliability

Nicole Hemsoth

For those familiar with the storage vendor ecosystem inside high performance computing, Panasas has been a recognizable name for a decade, as has the name of the man behind the initial technologies.

Garth Gibson, cofounder and CTO at Panasas spent some time talking with us during the annual Supercomputing Conference (SC11) in Seattle. In addition to getting in depth about the file systems that can support big data, we talked more generally about the research roots of his company–as well as what is around the corner for customers with ever-growing data sets.

During our conversation below, he took some time not only to put his company’s technologies in context, but to offer a sense of differentiation between the proliferating options for file systems and storage setups. An iconic figure in high performance computing, Gibson is best known for his roots in RAID, which was born out of his research, first at the University of California, Berkeley, and then Carnegie Mellon University. According to Gibson, these initial technologies were born out of his sustained interest in large scale and high performance storage and his desire to achieve scalability and reliability.

In the video above, Gibson describes the process of getting the initial RAID technologies off the ground and into mainstream HPC, prefacing his history in high performance storage by noting that he’s simply an academic research working on storage systems who decided that to have an impact on the world, he’d have to do more than write a lot of papers.

As he described, when he had trouble even giving away the RAID technology back in the mid-to-late 1990s, he knew he “had to leap in with both feet and produce solutions that would solve customer needs.” He told us that he had always been interested in scale and reliability as well as applying these benefits to the largest-scale systems, and now is seeing direct correlations in many problems in HPC to those that arise in the big data era.

Gibson says that while the scale of the data is roughly the same between HPC and big data applications, there are some important differences. While coming from a storage standpoint, Gibson reflects on his experiences with students working on CMU Hadoop projects—and how innovative storage approaches fit into the both the big data and HPC boxes.

To close, it seems appropriate to point to another Panasas figure, Brent Welch, Director of Software Architecture who said the following about the role of Panasas in the era of big data:

“To really process big data sets well, computer applications and systems must be capable of handling massive parallelism and the underlying storage systems must be architected for parallelism, too. Simply providing client-side support for the pNFS protocol will not deliver the performance seen by extreme big data systems like Roadrunner and Cielo.

Panasas has been instrumental in enabling parallelism in scientific discovery, dramatically improving the ability to deliver extreme performance for big data. This breakthrough in core scientific research is now becoming the standard for the next generation of commercial applications and systems. Just recently we saw the acceptance of pNFS, a parallel file system standard pioneered by Dr. Garth Gibson at Panasas, into the Linux distribution.”