Follow Datanami:
April 21, 2015

Riding the Waves of Big Data

Eric Carr

Big Data is typically defined by three core characteristics, commonly referred to as the three Vs: volume, variety and velocity. Today, we are not just creating greater volumes of data, we are also expanding the variety of data sources, while creating and requiring processing of this data at ever faster velocities. However, while the three Vs are accelerating in-sync with one another, a business’s ability to respond has evolved in waves.

While each element is important, when looking at production implementations, many organizations have primarily focused on the first wave, volume, while finding ways to report on and process the huge volumes of data being created. As customers and their competencies in the big data technology stacks matures, we see them moving to focus more on the second V, Variety, and then third on Velocity: fusing the wide variety of data sources and keeping pace with the rapid velocity of data created to deliver timely insights.

So how can businesses navigate these uncharted waters?

The First Wave: The Volume Data Deluge

As more time goes on, the more challenging data management becomes. According to IDC, 90% of all existing digital data was created in the last two years, and these volumes of digital data will continue to double every two years. This explosion in data volumes is forcing companies to think more strategically about how they manage and report on data, as current data warehouse solutions were not built to cost effectively handle the huge volumes being created. Many big data projects are therefore driven by the need to replace or augment existing solutions. Consequently, this “first wave” of big data maturity is often focused on simply trying to find ways to tame and report on the huge volumes of data and manage the exponential influx.hadoop in teh universe

Hadoop remains at the core of many of these big data programs, enabling businesses to store and manage vast quantities of data on commodity hardware so that they can store more at a lower cost and provide data reporting and exploratory capabilities. This focus on volume supports most of the Hadoop-centric investments being made today, driven by vendors such as Hortonworks, Cloudera and MapR.

These technologies offer a clear business case for reducing costs and helping to plan for the management and storage of the volumes of data, making it easier to get projects off the ground. However, historically, many have struggled to get past this wave and get beyond reporting centric use cases. Instead of thinking strategically about how they can derive insights from the data, the focus is on storing and collecting. As a result, many are struggling to then gain the business intelligence they need to truly maximize the value of their data.

The Second Wave: Variety Is the Spice of Life

While many companies have been ingesting different data feeds early on for reporting purposes, many have struggled with the challenges that variety and volume together. Being able to handle both gives greater context to business problems and begins to help break down the information silos in enterprises; this is where the second wave of data maturity comes in. Today, we are seeing the variety of data expand daily: the Internet of Things, smartphones, social media and video. Many of these data streams did not exist five years ago, yet they all impact on the network and provide vital information for the business.

According to IDC, 90% of all current digital data is unstructured, meaning it often comes in incompatible formats, making it difficult to correlate and integrate into traditional analytics solutions. As we see increased growth in the use of sensor data from connected devices and the Internet of Things, this variety and resulting complexity is set to increase alongside the volume.

Not only this, but all of these different data sources create a lot of noise; not all data is created equally, some has more value than others. Those who reach the second wave of maturity recognize this and start to question the data at the source, using streaming analytics to determine what data to store and analyze and what to ignore; essentially focused on results vs. exploration so that they only bring the relevant data to the table.up a creek with a paddle

The Third Wave: The Need for Speed

The final wave of data maturity adds the third V into the mix: velocity. While many companies are collecting and reporting on data, more advanced companies are migrating to stream analytics at the edge architectures in order to sort through the variety of data being created. Even fewer are focused on doing this in real-time. When considering the question of velocity, businesses should be thinking in terms of how quickly they can turn data into insights, and go one step further to embed analytics into automated work flows and dynamic business processes to drive action and results. By taking analytics to the next stage, those riding the velocity wave have actually started to create triggers in order to automate actions, helping to speed up business processes. We are at the stage now where this next wave is breaking.

For example, a key justification driving adoption of Apache Spark in place of Hadoop is that it allows businesses to combine the volume and velocity in real-time, beyond processing via the Hadoop traditional batch-centric processing. As we see developments in Self Optimizing Networks (SON), Software Defined Networking (SDN) and Network Function Virtualization (NFV), the question of real-time automation underpinned by operational intelligence provided by big data tools is critical.

On the Crest of Success

As the market matures, we are seeing businesses riding from one wave to the next: that’s when things will really start to get exciting. Arguably, the next wave will be when we start to see more machine-driven control loops and predictive analytics, helping to push us closer to a data-driven, value-centric future.eric carr

About the author: Eric Carr is Senior Vice President, Engineering at Guavus where he is responsible for leading design and development of the Guavus Reflex Platform. Eric was a Ph.D. candidate and holds a Masters degree from Stanford University. He also graduated from Carleton College with a BA in Computer Science, Economics and Math.

Datanami