We’re in the era of big data – but what do we mean by that? In our view, big data is a relative, not absolute, term. It means that the organization’s need to handle, store and analyze data (its volume, variety, velocity, variability and complexity) exceeds its current capacity and has moved beyond the IT comfort zone. big data is the classic dual-edged sword – both potential asset and possible curse. Most agree that there is significant, meaningful, proprietary value in that data. But few organizations relish the costs and challenges of simply collecting, storing and transferring that massive amount of data. And even fewer know how to tap into that value, to turn the data into information.
Is the enterprise IT department merely an episode of TV’s “Hoarders” waiting to happen – or will we actually find ways to locate the information of strategic value that is getting buried deeper and deeper in our mountains of data? Quite simply: What are we going to do with all of this data?
At its essence, high-performance analytics offers a simple, but powerful, promise: Regardless of how you store your data or how much of it there is, complex analytical procedures can still access that data, build powerful analytical models using that data, and provide answers quickly and accurately by using the full potential of the resources in your computing environment.
With high-performance analytics, we are no longer primarily concerned with where the data resides. Today, our ability to compute has far outstripped our ability to move massive amounts of data from disk to disk. Instead, we use a divide-and-conquer approach to cleverly send the processing out to where the data lives.
As our scenario at the beginning of this chapter illustrated, ultimately, HPA is about the value of speed and its effect on business behavior. If the analytic infrastructure requires a day to deliver a single computational result, you’re likely to simply accept the answer it provides. But if you can use HPA to get an answer in one minute, your behavior changes. You ask more questions. You explore more alternatives. You run more scenarios. And you pursue better outcomes.
But how do we bring the power of high-performance analytics to data volumes of this scale? We believe there are three basic pillars – three innovative approaches – to bring HPA to big data:
- Grid Computing: Distribute the Workload among Several Computing Engines – Grid computing enables analysts to automatically use a centrally managed grid infrastructure that provides workload balancing, high availability and parallel processing for business analytics jobs and processes. With grid computing, it is easier and more cost-effective to accommodate compute-intensive applications and growing numbers of users appropriately across available hardware resources and ensure continuous high availability for business analytics applications. You can create a managed, shared environment to process large volumes of programs in an efficient manner.
- In-Database Analytics: Move the Analytics Process Closer to the Data – With in-database processing, analytic functions are executed within database engines using native database code. Traditional programming may include copying data to a secondary location, and the data is processed using the programming language outside the database. Benefits of in-database processing include reduced data movement, faster run-times, and the ability to leverage existing data warehousing investments.
- In-Memory Analytics: Distribute the Workload and Data Alongside the Database – In this approach, big data and intricate analytical computations are processed in-memory and distributed across a dedicated set of nodes to produce highly accurate insights to solve complex problems in near-real time. This is about applying high-end analytical techniques to solve these problems within the in-memory environment. For optimal performance, data is pulled and placed within the memory of a dedicated database appliance for analytic processing.
What does it Take to Succeed with High-Performance Analytics?
HPA isn’t simply an incremental discipline. It involves innovative shifts in how we approach analytic problems. We view them differently and continue to find new ways to solve them. It’s more than simply taking a serial algorithm and breaking it into chunks. Success requires deeper, broader algorithms in multiple disciplines and the ability to rethink our business processes.
In our experience, HPA solutions to complex business problems require innovation along two different dimensions. First, algorithms and modeling techniques must be invented and built to exploit the power of massively parallel computational environments in three major areas:
- Descriptive analytics – You can report and generate descriptive statistics of historical performance that help you see what has transpired far more clearly than ever before.
- Predictive analytics – You can use data relationships to model, predict, and forecast business results in impressive ways and predict future events and outcomes.
- Prescriptive analytics –You can identify the relationships among variables to develop optimized recommendations that take advantage of your predictions and forecasts and foresee the likely implications of each decision option.
Second, HPA tools and products must be built, incorporating these high performance analytics techniques, to enable the following three purposes:
- Visualization and exploration of massive volumes of data.
- Creation of analytical models that use multi-disciplinary approaches such as statistics, data mining, forecasting, text analytics and optimization.
- Application of domain-specific solutions to complex problems that incorporate both specific analytical techniques as well as the business processes to support decision making.
What makes HPA so compelling to businesses across the spectrum – and makes them willing to undertake this fundamental rethinking of analytics – is the ability to address and resolve transformational business problems that have the potential to fundamentally change the nature of the business itself. By processing billions of observations and thousands of variables in near-real time, HPA is unleashing power and capabilities that are without precedent. Your business could witness the same results, for example, by taking the following steps:
- Implementing a data mining tool that creates predictive and descriptive models on enormous data volumes.
- Using those variables to predict which customers might abandon an online application and offer them incentives to continue their session.
- Comparing these incentives against one another and the budget, in real time, to identify the best offer for each customer.
That’s the kind of emphatic value that HPA can provide and why it’s continuing to garner the attention of many enterprises today.
Amazingly, the discipline of high-performance analytics continues to move forward at a rapid pace. As storage gets even more affordable and greater amounts of processing power become ever-cheaper, it’s easy for us to envision “analytical streaming” in real time where insights are not discrete events but are part of the minute-by-minute operation of the enterprise, woven into the fabric of every meaningful business process. Moving further down the cost curve will enable us to further democratize analytics and move it beyond the specialized analyst and into the hands of virtually every employee, increasing the breadth and depth of the value. By pushing out the power of this style of HPA, we have the opportunity to achieve exponentially outsized gains driven by new levels of rapid analysis.
Excerpted from Chapter 5 "Finding Big Value in big data: Unlocking the Power of High-Performance Analytics" by Paul Kent, Radhika Kulkarni, and Udo Sglavo in Big Data and Business Analytics, edited by Jay Liebowitz, ISBN 9781466565784. © 2013 Taylor & Francis Group LLC. Reprinted with permission.