Follow Datanami:
April 14, 2015

Hadoop: The Tale of Data Storage to Data Processing

Steven Noels

The expectations of CIOs around Hadoop are changing. They’re demanding more ROI from their data and saying goodbye to ‘experimentation.’ Of course, Hadoop is everywhere right now, but we’re seeing people (and companies) coining phrases such as “businesses becoming more data-driven.”

Since its inception, Hadoop has served as the quintessential landing zone for all lines of business and the data that resides in those lines. The open-source framework was a data storage trailblazer that got us to where we are today. But while Hadoop was and still is an affordable and scalable way for companies to store their data—much more so than data warehouses which can be costly and inflexible—an important shift is happening in Hadoop. The shift signals more opportunity for companies, or going beyond simple storage to becoming more data-driven.

According to IDC, the market for software that analyzes large data sets could top $41.5 billion by 2018, and investors have poured more than $2 billion into businesses built on Hadoop. As investors and companies continue to use Hadoop, it only makes sense that now is the time to gain more from Hadoop. Going beyond just data storage, but ensuring that data is more actionable.

Similarly, Hadoop adoption will accelerate within the enterprise, as more businesses build smart applications with real-time data analysis capabilities atop of the platform. We’ll see increased market consolidation, surge in Hadoop skills, and an increased involvement of the business in Hadoop initiatives.

As Hadoop adoption continues to rise, companies are demanding more from the framework, most notably the ability to take action. This includes real-time analytics to derive insights from big data—keeping track of how key customer metrics evolve over time, and acting on changing behavior. Not just showing trends on dashboards, but driving marketing execution and guiding customer experience through event detection and propagation.

Now, in order to move toward the ability to take action, it is important to bridge the gap between analytics and operations. One key concept here is the application of context. With traditional offline analytics, real-time, up-to-the-minute context is often ignored (e.g. the user’s location, the time of day, etc.) Analyzing the impact of these changes on users’ intent and preferences, and applying these insights to improve the customer experience, requires an approach which consists of two key components.

First, organizations need a defined set of actionable data points organized on the customer level, which allows organizations to model individual behavior and its evolution over time.

data pipeline_1So why is it important to measure individual-level metrics and model to each customer’s needs, intents and preferences? First and foremost, it allows organizations to drastically reduce the data preparation effort, which otherwise takes up 60-80 percent of every data project – such as predictive modeling, risk or fraud analysis, campaigning, etc. It also gives organizations the ability to track individual metrics over time, enabling them to monitor changes in behavior, detect trends and be alerted to important events. Additionally, by using descriptive, centrally-governed metrics, organizations can reduce discussion or misinterpretation, especially when combining data across departments.

Second, organizations must maintain data processing pipeline that is capable of both handling data at rest (i.e. the historical data ‘learned’ from the past) and data in motion (i.e. the ability to act on the present context). Leveraging data at rest is the original big data thesis – the idea of collecting massive volumes of historical data to glean insight into the future.

Many current big data projects are stalled in this phase, focused on the collection of structured or unstructured data, and using that historical data to build predictive models and ultimately learn how group behavior can predict or define individual behavior. However, to be truly data-driven, organizations must be able to analyze, make sense of and act on data in motion.

Harnessing data in motion means gaining an understanding of the actual customer context, and doing so in real-time. So it’s not just location and time, it’s how data ties into things like seasonality, household or peer dynamics, events—everything that impacts customers’ decisions. Engaging customers when it matters is often as important as (if not more important than) how they’re engaged or with what offers—as long as it’s an offer relevant to their context. This requires the technological capabilities to learn and execute on customer behavioral data in real-time, while being able to access and maintain the riches of historical profile data, an architectural style called the Lambda architecture.

At the business level, organizations must move away from campaign-based marketing and instead should market when the customer chooses to engage. There is a pressing need to react in a real-time manner, because acting even a few days late can lead to an unhappy customer and waste of marketing dollars.

For subscription-based companies, adopting this bi-fold approach, and measuring that individual behavior and harnessing data at rest and in motion, is crucial for realizing the strategic benefits of actionable big data.

What about you, what are your tactics for implementing an actionable big data strategy?

NoelsAbout the author: Steven Noels is the CTO and manages technology strategy for NGDATA. He was the co-founder of Outerthought, now known as NGDATA, and is the original designer of Lily, which sits at the core of the NGDATA software portfolio. He has 15 years of product management experience, delivering solutions in data reporting applications, content management and publishing systems and large-scale legal databases. Steven is extensively networked with the open source community in and around the Apache Hadoop Big Data ecosystem. Prior to NGDATA, Steven held various senior roles in technology consulting and product management with Alcatel and Wolters-Kluwer, specializing in complex and large-scale data management problems and content publishing. He’s a member of the Apache Software Foundation and holds a board position in the GentBC innovation platform. Steven holds a B.Sc. Printing Management, an Executive Master in CS and has been lecturing at the Antwerp Management School.

Datanami