DDN Addresses Scalability Wall
Last week the supercomputing community gathered in Seattle for SC11 to discuss trends in the space, most of which revolve around two core themes—exascale and big data.
These two movements go hand-in-hand in several ways, considering that the significant data movement, storage and other challenges related to massive data sets must be resolved right alongside the more general computational, programming, power and other challenges.
In light of the conference, Data Direct Networks (DDN) produced a letter to the high performance computing community in which it discussed what it feels are necessary developments on the road to exascale.
According to the author of the piece, DDN’s Executive Vice President for Strategy and Technology, Jean-Luc Chatelain, “apart from the obvious physical challenges related to the sheer scale of achieving efficiency in compute density and energy efficiency, DDN believes that there are additional important information architecture issues that must also be tackled.”
Chatelain says that building storage systems that are one thousand times more capable than the current generation of petascale systems means that a great deal of collaboration across the ISV, infrastructure vendors and users of eventual exascale systems will be required in coming years since “the challenge we face today can overwhelm us if we do not step outside of the traditional approaches to storage and data management.” He points to massive data demands in areas that stand to benefit from exascale computing advances, including genomics, uncertainty quantification, oil and gas, financial services and CFD , noting that storage challenges are one of the greatest hindrances already—and will become larger barriers without a new approach.
DDN says that HPC and big data are close to hitting the scalablity wall and that “attempts to build on the legacy I/O infrastructure will become increasingly expensive and fragile as current file system and storage system technologies, which were originally designed for use inside a computer, are reaching the limits of their scalability.” Chatelain says that there a few key areas of research and development that can alter the course of the petascale architecture era and bring new scale, efficiency and power to bear for exascale and big data applications.
One of the top items for DDN on this list is the need to develop in-store compute capability. This is the clear winner when it comes to exascale conversations for most storage vendors concerned with these same issues as the problems with moving massive datasets across storage and compute borders will, as Chatelain says, “outweigh the costs of the compute itself.” He points to the company’s own approach of In-Storage Processing as an advancement toward these goals, noting that this is necessary not only for exascale processing requirements, but also to handle the big data demands that many enterprises are already pushing onto the vendor community.
According to Chatelain, exascale computing can benefit from some of what has already been learned from those who build purpose-driven solutions to handle massive web-scale problems. He points to an example from the web, discussing how “Map/Reduce systems actually ship functions to right where the data is stored, eliminating network bottlenecks and forwarding only the pre-processed results for further analysis.” He says that in the age of exascale, such paradigms can be leveraged to manage metadata and handle processing functions and to provide integrated approaches to everything from data archiving and retirement.
Another area of development as we move toward ever-greater storage and computing demands is the use of object stores. He says that current models are hindering scalability and creating bottlenecks, pointing to namespace spanning as a prime example. In Chatelain’s view, we are moving toward true object stores but this is an area that will further develop again, in part due to the web-scale operations that are proving this model.
Knowledge management and the development of the next generation of SSDs are also targets on the DDN exascale and big data agenda. Chatelain says that while they are working on new technologies, it is “clear that the move to exascale will require disruptive innovation in the HPC architecture and the I/O subsystem in particular” if we are ever to move on to the new class of supercomputers, not to mention new systems and architectures that are equipped to tackle the growing data demands from enterprise users.
Interestingly, the company continually points not to the high performance computing community as the source of all innovation, but rather the advancements made on behalf of web-scale companies and research efforts that are blending purpose-built architectures and new frameworks to overcome scalability and efficiency challenges. The company could be onto something; the movements it cites as key to shifting the big data paradigm are all big data storage keys that are being explored in everything from the Graph 500 to Facebook.
The full letter from Data Direct Networks can be found here.