Follow Datanami:
November 16, 2015

Machine Learning’s Big Role in the Future of Cybersecurity

Information security has always been a cat and mouse game. The good guys build a new wall, and the bad guys figure out a way over it, under it, or around it. But lately, the bad guys seem to be circumventing our walls with greater and greater ease. Stopping them will require a quantum leap in the capabilities of the good guys, and that could mean more widespread use of machine learning technology.

It may surprise the casual observer, but machine learning is not widely leveraged in the IT security field at this time. Notwithstanding credit card fraud detection systems and network device makers that are using advanced analytics, the systems that automate common security activities in practically every large company–such as detecting malware on your PC or spotting malicious activity on a network–largely rely on humans to properly code and configure them, security experts say.

While there has been extensive academic research into the use of machine learning technology in cybersecurity, we’re just now beginning to see security tools that actually leverage the technology in the field. Startups like Invincea, Cylance, Exabeam, and Argyle Data, are leveraging machine learning techniques to power security tools that are faster and more accurate than what major security software vendors offer today.

Data Mining Malware

Josh Saxe, a principle research engineer at Fairfax, Virginia-based Invincea, says it’s time to move on from the old signature- and file hash-based approaches that were developed in the 1990s.

“Anti-virus companies to my knowledge have done some dabbling in machine learning, but their bread and butter is still signature-based detection,” Saxe tells Datanami. “They’re detecting malware based on file hashes or based on pattern matching that a human analyst comes up with to detect a given sample.


Invincea’s advanced malware detection system is based in part on DARPA’s Cyber Genome project

“While they’re successful at detecting old malware that has been seen in the past, they’re not specifically good at detecting new malware, which is part of the reason there’s an epidemic in cybercrime going on today,” he continues. “Nation-states and actors that would like to break into your computer are able to do that very successfully even when you have AV installed, because the signature-based methods don’t work.”

At Invincea, Saxe leads a project to build a better malware detection system using techniques gleaned from machine learning. The project, which is part of DARPA’s Cyber Genome project, essentially uses machine learning to mine malware for insights, including reverse engineering how malware works, performing social network analysis on the code, and using machine learning-based systems to rapidly score new malware samples in the wild.

“We’ve shown empirically that the method we developed that uses a machine learning-based approach is better than the AV system,” he says. “Machine learning systems are able to automate the work that human analysts have been doing, and even improve upon it.  When you combine that with a huge amount of training data, it turns out you can beat the traditional signature-based systems at the detection problem.”

Invincea uses a deep learning approach to accelerate the training of the algorithms. Currently, Saxe has about 1.5 million samples of benign and malicious software that he uses to train the algorithms, which is done on GPU using Python tools. As the library grows to 30 million, he expects the advantage to grow in a linear fashion.

“The more training data we have available and the larger that mountain of malware available for training machine learning systems, the more advantage machine learning systems will have in the race to detect malware better,” he says.

The current plans call for Invincea to add the deep learning-based capabilities to its end-point security product in 2016, Saxe says. Specifically, it will be added to Cynomix, a feature of the end-point security product, which already uses machine learning techniques.

Malicious User Detection

Machine learning also stands to help the good guys in another facet of IT security: detecting malicious internal users and identifying corrupted accounts.

Just as the major antivirus products rely on signatures to identify malware in a catalog, user activity monitoring tools also lean on signatures. And just as signature-based detection is starting to fail for malware detection, it’s also not performing well in the user activity monitoring space.

“Historically, the security officers in enterprise have relied heavily on products that use a signature-based approach, like IP address blacklisting,” says Derek Lin, Chief Data Scientist at Exabeam, a provider of user behavior analysis tools.

“They’re looking for things that already happened,” he continues. “The problem with the signature based approach is they only get to see the signatures after they have happened. These days, security officers are very much focused on detecting malicious events that don’t have signatures.”

Exabeam_stateful user tracking

Exabeam builds a picture of user behavior by tracking user activity across sessions, devices, IP addresses, and credentials.

Savvy cybercriminals today know they can defeat signature-based approaches by slightly altering their approaches. So if the intrusion detection system maintains an IP blacklist, the cybercriminal breaks it by continuously jumping back and forth among a large number of domains at his disposal.

Instead of sticking with a defensive scheme from yesteryear, Exabeam advocates taking a proactive approach based on Gartner’s concept of User Behavior Analytics (UBA). The idea behind UBA (or the related concept of User and Entity Behavior Analytics) is that there’s no way to know which users or machines are good or bad. So you assume they’re all bad, that your network has been compromised, and you constantly monitor and model everything’s behavior to find the bad actors.

That’s where the machine learning algorithms come in. Lin and his team use a variety of supervised and unsupervised machine learning algorithms to detect anomalous patterns of user behavior, as gleaned from a variety of sources, like server logs, Active Directory entries, and virtual private networking (VPN) logs.

“It’s all about profiling the user behavior. The question is how to do that,” Lin tells Datanami. “For every user and entity on the network, we try to build a normal profile–this is where the statistical analysis is involved.  And then on a conceptual level, we’re looking for deviations from the norm….We use the behavior based approach to find anomalies in the system and surface them up for the security analyst to look at.”

Security’s ML Future


(image courtesy ImageFlow/

Considering the number of major security breaches we’ve suffered, and the creative ways that cyber criminals are finding into supposedly secure systems, the good guys could use a break. Could that advantage come from machine learning? It very well could, says Patrick Townsend, CEO and founder of security software vendor Townsend Security, says.

“Now that we’re starting to get systems that can really effectively handle examining large amount of very unstructured data and detecting patterns, I’m hoping that the next wave of security products will be based on cognitive computing,” he says. “Look at Watson. If it can win Jeopardy, why can’t it parse all these security events worldwide and make sense of them? I think we’re on the very early cusp of the use of cognitive-based computing to help ramp up security.”

Invincea’s Saxe hopes to ride that wave. “It’s not that surprising to me that the incumbent companies haven’t caught onto this wave and productized algorithms based on these new deep learning approaches,” he says. “It’s just now becoming possible to train the kind of machine learning models that we’re using. You couldn’t have done this effectively 10 years ago.”

Related Items:

The Dark Web Gets a Little Brighter, Thanks to Big Data

A Peek Inside Cisco’s Hadoop Security Machine

Stomping Out Criminal Scams with Hadoop

(feature image courtesy agsandrew/