AIOps Becomes a Must-Have Tool In Age of COVID-19
“Is it Monday or Saturday?” You can be excused for forgetting which day of the week it is. After all, the days have sort of blurred together during the COVID-19 lockdown. It’s a minor change at the individual level, but at the population level, the lockdown has spurred a major shift in usage patterns. For major telecommunications companies and website operators, the shift has rendered rules-based monitoring approaches next to worthless.
Ira Cohen, the founder and chief data scientists of AIOps software company Anodot, has watched global access patterns for telecommunications and Web-based applications shift dramatically during the COVID-19 pandemic.
“We’d see metrics that exhibit a weekly pattern all of a sudden have a completely different signal pattern, because now everybody is at home, and they’re using [the Internet and computers] in a different way,” he says.
Companies that manually built their IT and application monitoring systems using a rules-based approach or using a simple statistical approach have really struggled to keep up with the changes, he says.
“If I designed my rules to look at things week over week, or to look at the same hour or the same day, all of sudden that broke,” he says. “There’s no notion of difference between weekends and weekdays.”
The situation came to a head recently in one country, where a major telecommunications company that Anodot was courting suffered serious performance degradation. The company failed to detect the problems in a prompt manner using its traditional rules-based system. And if it wasn’t for the fact that the CEO of the company happened to live in one of the neighborhoods impacted, the problem might have gone undetected indefinitely.
“Even after four to five weeks, their network was just getting hammered and they were missing these things that were happening, loads on the network in various places and in various applications,” Cohen says. “All their rules went out the door. They weren’t functioning. They were basically missing that there’s a performance issue in this region, and all of it was going to the press.”
Luckily, the company had signed a deal to acquire Anodot software in February. But when the CEO raised the problem with his CTO, they accelerated the Anodot rollout to ensure they could detect when similar problems impact other neighborhoods where the CEO didn’t just happen to live.
AIOps solutions that uses machine learning algorithms to detect anomalies buried in huge amounts of network and log data is no panacea. Machine learning models can’t anticipate unprecedented shifts in network usage, like we’ve seen with COVID-19. But given enough time, an ML-based system will eventually catch up to the new usage pattern and be able to isolate legitimate network and application problems from the new signals and the new pattern.
“If you have a machine learning-based method, initially of course it takes a week or two to adjust to the new pattern,” Cohen says. “But it gets adjusted automatically, by itself, and there’s no effort. That’s what you want.”
Companies can continue to rely on rules-based approaches to alerting them to network and application anomalies. But in most cases, they will have to manually change the alerts to account for the new consumer usage patterns stemming from the COVID-19 lockdown. That’s not easy, as the alerts are typically spread out across many different systems.
“There are a lot of these rules,” Cohen says. “It’s not one thing. It’s in a lot of places. So they have to go back and change many, many things.”
As the lockdown is lifted, companies that rely on rules-based approaches will need to tweak the rules to account for the new usage patterns. And if there’s a second or a third wave of coronavirus infections later this year and into 2021, they will need to figure out what those patterns are and manually change their alerts again.
“There was a seminal shift that happened when all the world started going into lockdown for a period of a few weeks,” Cohen says. “Now the patterns are still changing depending on the geography, the location, based on political decisions. And it probably will keep on changing.”
Of course, if COVID-19 vanishes, companies will be free to use the same rules they had before coronavirus. But who’s willing to make that bet?
“Who knows when the rules they had pre-COVID are maybe going to be applicable again?” Cohen says. “A year? In the interim, things are going to change and you need something that continuously changes with it, otherwise you’re lost. Your basically blind.”
Editor’s note: This article has been corrected. Ira Cohen is the chief data scientist of Anodot, not its CEO. Datanami regrets the error.