Advanced Analytics, Big Data, and APM
If application performance management (APM) is going to achieve its objectives in a world of composable applications and dynamic workloads, it needs granular data about both the workloads and their environment. This means that APM solutions need to enter the world of big data and advanced analytics.
There are two diverse analytic approaches to application performance monitoring, based on different amounts of data. The older approach, developed when storage was more expensive and the agents that collected data consumed too many processor resources, relies on a sampling of performance measures collected at pre-determined intervals. It may collect more granular data when a certain trigger occurs but in general the data is a subset. This smaller dataset uses algorithms to fill in the gaps and come up with a relevant diagnosis. Finding and solving performance issues with this method often requires administrators to adjust the frequency and quantity of data collected based on their hypotheses.
The more recent approach relies on modern computing designs and storage capacities to collect raw, unsampled data, resulting in datasets that are often too large for traditional tools to evaluate. With the new approach to big data and APM, the analytical challenge is no longer the creation of models and iterative experiments but finding and connecting relevant information without being overwhelmed by the sheer volume of data. Machine learning techniques are being successfully applied to this challenge, delivering insights into application health and predictive alerts that help DevOps teams quickly address pending issues before they impact customer experience. It’s no wonder that innovative APM vendors are now approaching APM from a big data perspective.
Insufficient Data – Does Not Compute
One of the most common phrases used by an artificial intelligence in popular culture is some combination of “insufficient data, does not compute.” While intended as satire about the computer’s inability to extrapolate or deal with some type of paradox, the saying has a kernel of truth in it. If we are trying to figure out what is causing performance issues in an application, it is difficult to do this quickly and correctly with insufficient data.
In their recent Magic Quadrant report on Application Performance Monitoring Suites, Gartner analysts state:
“IT operations and other organizations are already awash with data, and this situation will likely only become more challenging with the growth of cloud, microservices, IoT and so on.”
If IT operations are already hitting big data volumes, artificial intelligence and machine learning tools are a necessity, as many other departments and industries have already discovered.
Big data analytics are different
Applying advanced analytics to big data for APM objectives is quite different from working with sampled data and opens up several new opportunities for finding and resolving issues. Some of the most powerful examples are diagnosing intermittent problems, analyzing an application’s full ecosystem, and performing forensic audits.
Tracking down intermittent problems, elusive failures, or unpredictable occurrences in an environment that is always changing is the bane of every DevOps and IT Operations team. In the increasingly dynamic environments, where containers and microservices have very short lifespans, sampled data collection could miss all or part of the activation, leaving the team completely in the dark.
With continuous collection of diagnostic and performance data, the triage process is based on a complete data set rather than making assumptions based on past experience or data samples. Advanced analytics and machine learning make short work of these tasks, filtering out unnecessary data and enabling the team to quickly find the critical issues, without having to setup collectors targeting specific failure conditions in advance.
The success of triage highly depends on the quality of the forensic data. The incompleteness of data can lead the team down multiple paths, researching multiple candidates in the search for the root cause or root causes of an application problem. No one sitting in a war room even said, “We need less data to find the root cause!”
Analyzing the Full Ecosystem
Applications depend on a large ecosystem of libraries, virtual machines, microservices, APIs, networks, storage, and data. Without this shared ecosystem, many (most?) of our applications would not exist but sharing definitely makes it more difficult to isolate application performance problems.
With a big data approach to APM, development and support teams can discover patterns across the entire ecosystem, not just a single application or component. Conversely, if a problem is discovered with one application, analytics can easily identify similar relationships across the entire data set and flag likely suspects.
Organizations taking an ecosystem approach often find that by optimizing a few common problem areas they get as much or more performance improvement in less time than if they work on each application individually.
Performing Forensic Audits
Using high-resolution data to solve performance problems more quickly frees up time that organizations can use to look for unforeseen behaviors, unintended consequences, and other strange activities. Regular forensic analysis also helps to reduce the performance bloat that is a
natural occurrence of continuous upgrades and feature adds.
With a data available for every transaction at every step, advanced analysis techniques can calculate how much time and resource is spent by every method or external call, which are used the most, and which cost the most. Going the other direction, drilling down into a specific incident or user enables the support team to find edge cases, and even isolate suspicious or malicious usage that could result in serious data loss or service outage.
Where To Now
As with anything new, there is still a fair amount of fear and uncertainty about using big data and advanced analytics with APM. The two most common fears are scalability and overhead. Capturing detailed records for every transaction represents petabytes of data, but not more than many other industries that have successful big data operations. The scale may be large, but there are documented best practices for how to architect and store APM data.
At the same time, performance data is of little use if the act of collecting the measurements consumes a significant percentage of available capacity. Distributed architectures, data compression, and lightweight agents ensure that transactions can be observed and recorded without impacting the user’s experience.
APM needs big data and advanced analytics. As Gartner reports, “The growing modularity, dynamism, and geographical dispersion of infrastructure and application stacks coupled with their growing importance to enterprise revenue generation has increased the attractiveness of a big data, AI-enabled approach to APM (and monitoring in general).” Expect more vendors to follow suit.
About the author: Gayle Levin is director of solutions marketing at Riverbed Technologies. Previously, she held product marketing and campaign roles at VMware, Oracle, and Splunk as well as several startups. Her interests lie in the impact of technology on the way we think and work today.