Follow Datanami:
August 2, 2016

Machine Learning Brings Real Insight to Jordan’s Virtual Environment

(Timofeev Vladimir/Shutterstock)

As one of New England’s largest furniture chains, Jordan’s Furniture knows a bit about volume. But when the volume of data stores on its virtualized IT infrastructure grew to a certain size, the company found the complexity was a hindrance to the troubleshooting of potential problems. That’s when it turned to an advanced analytic toolset for greater clarity.

The rise of virtualization products and hypervisors has transformed the IT industry and reduced the threat of X86 server sprawl in a major way. Instead of running dozens of individual server instances on standalone hardware, organizations can use a hypervisor like VMware‘s ESXi to house each server instance in its own virtual machine (VM) that runs on a single physical server.

While server virtualization bolsters hardware utilization and drives higher efficiency, there are downsides too. For instance, it can be very difficult to troubleshoot application performance issues when workloads are “hidden” behind the virtual curtain. In particular, tracking the actual I/O of application workloads as they traverse host network adapters into storage array networks (SANs) can be very difficult in a virtualized environment.

This is the problem that Jordan’s Furniture ran into with its VMware environment. The popular East Taunton, Massachusetts company relies on VMware software to virtualize about 110 cores of its Hewlett-Packard (NYSE: HPE) servers running in a 7000 series blade chassis. The company splits this physical hardware into about 100 production VMs, which are used to run various back-office workloads on the Windows Server OS, including SQL Server databases, Exchange, Web and file servers, and Active Directory. The HP blades are connected via Fibre Channel to a Hitachi SAN with about 30 TB of capacity.

The lack of insight into the virtual environment bothered Ethan Peterson, a network engineer with Jordan’s Furniture. “It wasn’t any glaring problem we were having in the virtual environment,” Peterson tells Datanami. “It was more we didn’t have visibility.”

Unexpected Issues

Jordan’s employees would occasionally complain about the applications responding slowly. For Jordan’s, which runs seven big showrooms in the Northeast, these complaints would normally occur on the weekends, when furniture buying peaks.Jordans_logo

Trying to find the root cause of any application issue was tough, says Peterson, who has a VMware certification. “We don’t really have a way to look at the historical data in real time,” he says. “Is it something on the storage going wrong, or too much bandwidth being used on the network, or something with a CPU? We didn’t really have any analytics to do that.”

Peterson considered using VCenter Ops, an operations console sold by VMware. While that product provided detailed metrics of the performance of VMs, the asking price was a bit too much for what Jordan’s needed. “We wanted the basic information, the monitoring and just the knowledge of what was going on to be able to able to troubleshoot something,” he says.

That’s when Peterson heard about a new product called SIOS IQ that was specifically designed to monitor virtual environments. Jordan’s has been a longtime customer of SIOS‘s SQL Server data replication software for disaster recovery and high availability, but the SIOS IQ product was new to Peterson.

First released in 2015, SIOS IQ product uses machine learning algorithms to create a model of each user’s IT environment–specifically how CPU, storage, and network usage varies for each VM. If one of these metrics lands outside of the normal range, then SIOS IQ bubbles up an alert.

While Peterson didn’t have any pressing performance problems that needed to be resolved, the lack of visibility into the virtual environment left him feeling uneasy. “We were lucky enough not to have too many problems,” he says, “but that didn’t mean we didn’t want to know if something was coming up.”

Latency Spikes

Peterson installed SIOS IQ on his network and was impressed almost immediately. “It was much easier to use [than VCenter Ops]. The information there was pretty good,” he says. “When we demoed it, we actually found some problems from a storage standpoint. That was pretty neat.”

SIOS_3

SIOS IQ connects the dots between application performance and CPU, storage, and network issues

It turns out that Peterson detected one of those “sleeper” issues in the VMs that could have turned into something worse down the line. It had to do with how storage capacity is provisioned within the VMware hypervisor, and latency spikes that can result from it.

When Peterson provisioned the SAN storage capacity in the hypervisor, he assigned individual VMs to virtual data stores managed in the hypervisor. There wasn’t a good way to determine how many VMs to put in each virtual data store, or how big the data stores needed to be to optimize performance.

In short, it was a bit of a hit-and-miss exercise that carried performance repercussions, even for an experienced VMware engineer like Peterson. “If there are too many machines on that data store, it can cause latency because one is trying to read or write a bunch of data and other ones can’t, and it falls out of acceptable threshold,” he says.

But now with SIOS IQ on the case, Peterson is armed with actual data to back up his virtual data store decisions. That’s led him to reorganize his VM environment, including reducing the number of VMs assigned to data stores, and to make the data stores smaller.

“Before we were grouping them by function,” he says. “But now we’ve figured out that’s probably not a good way to group it. Now we’re working on splitting them out to see if it helps improve performance…That’s what I’m doing right now, and it’s solely because SIOS IQ gave us the information to know to do that. Otherwise, we would have just kept doing the same thing we were doing.”

Black Boxes

The inner workings of the VMware hypervisor remain a black box to Jordan’s Furniture. This isn’t something that’s unique to VMware; in fact, it impacts all users of hypervisors from IBM (NYSE: IBM), Microsoft (NASDAQ: MSFT), Oracle (NYSE: ORCL), and others.

SIOS IQ doesn’t get to peek behind the hypervisor curtain to figure out how things are running. Instead, it must infer the state of things through secondary impacts. It just so happens that, armed with machine learning and graph analytic capabilities, SIOS has figured out a way to make it work in a manner that’s better than the status quo.

For folks like Peterson who are tasked with running business systems on a regular basis, this capability helps ensure that virtualization-related problems will show up on the radar, even if the user doesn’t know exactly what he’s looking for.

Related Items:

Adding a Human Element to ML-Powered Server Administration

Corralling VM Complexity with Machine Learning

 

 

Datanami