The Application Angle to Unstructured Data
This week, Tom Leyden from Amplidata broke the interview-as-marketing vehicle mold when he thoughtfully addressed what big data needs he’s hearing about, the shifting nature of applications, and the role of cloud models.
Leyden joked that the buzz around big data is happening because “cloud computing is getting old, the recession is not over and the industry needs new hype.”
More seriously, however, he said that while right now the focus seems to be directed at big data analytics, the real meat of the issue is in all of the big unstructured data.
For an object storage company like Amplidata the arrival of big unstructured data’s value marks a clear opportunity. As the company’s Paul Speciale wrote not long ago, “Corporations have previously viewed big unstructured data as a burden and therefore a cost, they have turned to the lowest-cost media available for storage of this data: tape. Now that there is a realization that there is tremendous business value in unstructured data, they understand that keeping it dormant and hard to access on tape is indeed a highly inefficient choice.”
According to Leyden, “companies are turning dead tape archives into live disk archives and investigating ways to actively use the archives (rather than just spending money on tape and not accessing the data ever).” He says that the key technology here is erasure coding; an alternative to RAID that provides much more reliability with less overhead and cost. Of course, at the heart of this, at least if you ask an object storage guy like Leyton, is the belief that object storage is the latest, greatest way to store massive amounts of complex, unstructured data and that is provides a leg up for the (oftentimes legacy) applications that need to tap such data.
He says that as for cloud, it’s still up in the air as to whether or not companies can actually save money, but it does lend to business agility. Then again, he said, for an object storage company like his own, the meaning of the word cloud, at least in the context of enterprise big data, is in question. He says that “In the storage industry Amplidata is seeing the start of a paradigm shift from file-based storage to object storage (no file system, a programmable REST API, cloud storage).
Leyden says, “This is probably just one phenomenon that is added to these numbers. Most enterprises still run on legacy applications for the most part. As the shift is turning to applications in the cloud, we will probably see a big wave of migrations of legacy applications to the cloud, especially as object storage helps facilitate this. How do we explain a factor 6 growth for the cloud industry? Applications.”
On that note, he claims that the term “unstructured big data” is in itself difficult to pin down due to diversity of the data as well as the all-important applications. For instance, he points to “big science data”, which refers to genomics research projects for example (both analytics and unstructured). Then there is also “big enterprise data”, which is mostly the massive amounts of documents and other unstructured data that is generated by companies. On the other side there’s specific “big entertainment data” that is unique to the film industry as improved film quality has had a big impact on storage requirements. At yet another end of the spectrum are “big data streams”, which refers to large volumes of data generated by cloud applications such as Twitter and Facebook.
In the end, Leyden says, it’s not about the data, the structure or the vehicle for data processing and transmission (cloud or otherwise), the emphasis should be on the applications when making important “big data” decisions.