Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Webinar Powering Research with Knowledge Discovery & Data Mining

February 13, 2012

Big Data and The SSD Mystique


SSDs, and more generically, non-volatile solid state technology (typified by flash today), are a hot topic in big data, not to mention the data center right now. 

The attention is justified, as there hasn’t been such a transformative technology on the horizon for several years.  With several established and startup companies developing competing architectures and solutions based around SSDs, is there room for another startup?  Aren’t we well covered between PCIe flash cards, SSDs, flash caching solutions, flash tiers in storage arrays, and all-flash application accelerators?

The answer is no.  There is still a massive gap in the market waiting to be filled.  The challenges that need to be addressed are twofold.  First, true scalability with flash media has not yet been achieved.  And second, flash has not yet been utilized in innovative ways to improve the storage experience beyond performance.

Let’s dive into the first challenge.  The solutions on the market today are a big improvement over disk-based storage.  After all, moving from tens of thousands of IOPS to hundreds of thousands of IOPS is an order of magnitude improvement (not to mention the commensurate drop in latency). 

This is very exciting for today’s end-users.  But history shows us that when new computing technology emerges, it doesn’t take long for people to figure out how to utilize the performance gains and quickly be left wanting even more.  Think about it.  Will you be forever happy with the servers and networking you have today?  Or do you want them to improve year-after-year to keep up with new and perhaps unpredictable demands?  The same is true for storage.  Today’s order of magnitude improvement will be quickly absorbed and deemed insufficient tomorrow.

The current generation of systems-level SSD products have not been engineered with this in mind.  The landscape exclusively consists of “scale-up” architectures that require forklift upgrades whenever performance limits are reached.  SSDs have such immense performance potential that the scale-up model cannot be sustained. 

Storage processing at the array controller level inevitably becomes a bottleneck.  The long-term model for success is a “scale-out” design, where individual building blocks are clustered together in a common system, allowing capacity and performance to be added dynamically, and if well designed, without limit.  With scale-out architectures, today’s need for hundreds of thousands of IOPS can be met while still providing for tomorrow’s need for millions (and someday billions) of IOPS.

The second opportunity for startups in the SSD arena arises in the software stack.  The current crop of SSD products are adaptations of designs that originated years ago in hard disk-based arrays.  There are valid gains to be had by replacing HDDs with SSDs, but it doesn’t truly unlock the potential that SSDs have to offer.  End-users quickly discover that their array now reaches its performance potential with fewer drives populated, but the performance limits themselves have not changed.  In order to deliver the full potential of SSDs, and entirely new software architecture must be developed.  Only then can the performance of arrays full of SSDs be delivered.

Performance is only one aspect to consider.  Every capability in the storage software stack must also be reexamined.  Storage architects and administrators have learned over decades that there are certain “truths” in how storage systems behave, in their capabilities, and in their limitations. 

These “truths” are rooted in software architectures designed for hard drives, and will not change simply by substituting SSDs.  The next big leap in storage system capabilities will come by creatively thinking about not just the performance potential of SSDs, but how their unique properties can be exploited.  Everything from data protection to efficiency to ease-of-use to array-based copy services can be dramatically improved.

Big data is an area that can truly benefit from storage innovation based on SSDs.  Many organizations are currently constrained in the types of analytics they’re able to perform because it is impossible or uneconomical to perform the queries using today’s storage technology – even when accelerated by SSDs (they hit the array controller limits and then cannot scale-out).  Or worse, the queries take too long to complete and the results are out of date before they can be used. 

We have talked to retailers, telcos, financial institutions, and government entities that need to process data in real-time, in increasing volumes, and with more complex query structures in order to detect fraud, price products, or analyze quickly changing trends.  What they envision simply cannot be achieved on today’s technology, but if given the tools, entirely new use cases open up.  It may be hard to believe that somebody could make productive use of millions of IOPS, but the latent desire is there waiting to be unleashed.  Application developers and IT architects will quickly adopt these new storage and data processing technologies when they are brought to market.

Another key factor to consider with SSDs is whether to pursue a server-centric or storage-centric approach.  The server-centric approach typically involves PCIe-based flash cards populated in hosts and is analogous to direct-attached-storage (DAS).  This is a great approach when the data sets are small enough to fit completely in the PCIe card and expandability, data protection, disaster recovery, high availability, and the need to share the data set are not concerns. 

Storage-centric designs place SSDs in a shared array where the resource can be accessed by multiple hosts, and in more advanced designs, is also well protected from failures and highly available.  The advantages and drawbacks are essentially the same as DAS vs. SAN with traditional disk arrays.  In the end, there is no one correct model and the application environment, performance and availability profile, and data set sizes and growth rates must all be considered.

The level of virtualization in a data center is a key determinant for both the need to use SSD technology and for whether to use a server or storage-centric model.  In general, the more heavily virtualized and the more CPUs and cores per host, the more attractive SSDs and a storage-centric design become. 

Virtualization running on multi-core CPUs creates highly random workloads as seen by storage devices, even if individual guest operating systems and applications are reading and writing sequentially.  For big data applications where multiple hosts need to process common data sets, a storage-centric SSD design may be the only viable choice.

The future is very bright for SSDs and the best of what this technology has to offer is still to come.  Fast forward a few years into the future and we can be virtually guaranteed that applications will exist that we couldn’t have even imagined today, made possible by the performance potential of SSDs and the architectures innovative start-ups are creating right now to unleash them in the datacenter.

Related Stories

Big Data I/O Benchmark Gains Steam

Fusion-io Flashes the Future of Storage

DDN Intros New Big Data Appliances

About the Author

Josh Goldstein is Vice President of Marketing and Product Management at XtremIO, a provider of 100% flash scale-out enterprise SAN storage arrays.  XtremIO is currently in customer trials.  For more information, contact info@xtremio.com.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

Leverage Big Data

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event