The Wild West and Last Frontier of Big Data
We are in the Wild West of big data. The speed of processing keeps getting faster, while the volume of data that can be processed is beyond what could have been imagined just a few years ago. The Last Frontier of big data, meanwhile, is the discovery of value hidden in disparate data sources that have yet to be blended and harmonized. Just like the gold-seeking pioneers from centuries past, big data pioneers who embrace this challenge and blaze their way through the Wild West will be rewarded during the upcoming Gold Rush.
To understand where we are going, let’s first understand where we have been. Realize that the “easy” problems have all been tackled. The first generation of data miners picked up most of the big nuggets of informational gold that were lying out in the open. Going forward, things get tougher. You’re going to need to dig deeper and process higher volumes of data “pay dirt.” You’re going to need to mine this data faster than ever—potentially as fast as it is generated. That’s not an easy goal, but it is a deterministic and finite one.
This problem is bounded by hardware and software, and to solve it we’ll need to address several things. We’ll need a close inspect of the workflow. We’ll need bigger and faster servers and pipes. And most of all, we’ll need more creativity on the part of the humans designing and manning the big data mills. Collectively, this will lead to better systems that are capable of solving this real-time problem.
Building such a system will be an iterative process. The big data practitioner will move slowly at times, more quickly at others, but eventually the state of art will be improved upon. Perhaps our biggest challenge is knowing that the tools themselves will not generate business value. We’re in uncharted territory, and there is no known or obvious solution to get to the promise land. Therefore, we cannot have a deterministic approach that can be algorithmized, prototyped, and deployed. We’re working in a new realm.
The “Last Frontier” Is Not a Technical Problem
We are at the verge of the Last Frontier of big data. The tools for handling enormous amounts of data faster are there and will only continue to get better. However, it is not obvious how to discover and internalize the increasingly evident and abundant set of disparate data sources that have always existed and the new data sources that are constantly sprouting. The Last Frontier of big data is the ability for any enterprise to discover the data that holds the signals that when combined with the enterprise’s data will unlock hidden value.
The “Wild West” Attitude
So what will it take to get to the Last Frontier? You need a “Wild West” attitude, of course! Enterprises will need to find and encourage employees who:
- Are willing to go the extra mile through a mix of creativity, foresight, and technical abilities;
- Can understand what data sources they have access to;
- And who can identify data sources that they need that might or might not exist yet, find these data sources or create them and blend them together to uncover the truth.
Enterprises will need to understand that there is a new type of R&D–one that is about discovering data variety and experimenting by bringing these disparate data sources together.
Since this is the Wild West, this process will be fraught with high expectations, missed opportunities, and immaterialized tangents. However, this will become the new form of R&D, so you should factor that into your budget and the cost of doing business. Participating in the taming of the Last Frontier and surviving the Gold Rush will require patience, resources and trust through a new culture and new processes. No guts, no glory!
Enterprises will need to invest in a new kind of big data platforms that focus on the fast blending of disparate data sources. Platforms that offer pre-organized, discovered, and provisioned data sets that otherwise would require manpower to discover, purchase and provision will be preferred. Platforms that automatically recommend data sets for blending based on the use case; behavior and usage of the system by the enterprise will enable this Wild West attitude.
Mine the Hidden “Truth”
Many enterprises are failing to realize the value of their big data investments. This is because the last frontier problem requires creativity and a deep understanding of how an enterprise delivers value to their customers and their deep, entrenched position in their industry wide, intricate value network of dependencies and supply relationships.
The secret lies in basic customer and product management. Enterprises need to understand problems or issues being faced by their customers and explain these problems using data. Disparate data sources often contain the information required to explain these problems and issues or at least understand the context in which these problems and issues arise. These disparate data sources also often contain signals and attributes that can explain changes of both types: slow manifestations and abrupt shifts.
The key is to start small, iterate and experiment with blending and analyzing across different data sources. This iterative approach can highlight attributes or events that can explain unexpected changes or problems reported by customers.
The Gold Rush Will Hinge on ‘Variety’
The last unsolved problem for most enterprises is tapping the variety of available data sources for making your customers happier and in turn positively impacting your bottom line. Enterprises that prepare for this rush, implement the cultural process changes, and invest in the appropriate technologies to blend and experiment with data will be the most successful.
The Last Frontier is not completely unexplored. Pioneers like Google, Netflix, and Amazon have shown that the value in data can be mined and exploited. In addition to their technology investments, these enterprises stand out because of their processes and cultural focus on enabling experimentation with data blending and analysis.
The faster enterprises can move and beat their competition to the Last Frontier of big data, the larger will be their share of the upcoming Gold Rush.
About the author: Kumar Srivastava has spent his career building big data and analytics products in the areas of social networking, online security, identity, reputation and trust management, online fraud and abuse, search and advertising, digital platforms, mobile applications and monetization services. Currently, Kumar is the Senior Director of Product Management at ClearStoryData, a cloud big data analytics startup based in Menlo Park, CA.