February 5, 2019

Filling Cybersecurity Blind Spots with Unsupervised Learning

Alex Woodie

(I000s_pixels/Shutterstock)

What you don’t know can hurt you. And when you’re processing millions of transactions per day, what you don’t know probably is hurting you. That’s why using unsupervised machine learning techniques to create a fuller picture of real cybersecurity risks in the big data era is a big priority — not just for banks, but for regulators and customers too.

The numbers around the financial impact of cybercrimes are gaudy: About $5 trillion in annual damage. One fifth of Americans’ identities compromised. Ransomware attacks every 14 seconds. It’s a wonder anybody goes online anymore.

As the stakes rise, so too does the technological investment. An arms race is well underway between the good guys and the bad guys to see who can better employ modern techniques — such as machine learning and AI — to claim cyber supremacy. Western governments are backing this investment to keep the Web safe for commerce and to defeat the cyber gangs, some of which have the backing of foreign governments.

One of the companies that’s been enlisted in the good fight is ThetaRay. Founded in 2013 by computer science professors Amir Averbuch of Tel Aviv University and Ronald Coifman of Yale University, the Israeli company has developed anomaly detection software that it claims can find signs of cybercrime, such as money laundering or fraud, hidden in huge masses of transactional data that other approaches can’t find.

Hiding in Plain Sight

ThetaRay uses an array of techniques, including supervised and unsupervised machine learning techniques and link analysis, to detect patterns in the data that could be signs of fraud or criminal activity.

Mark Gazit, the company’s CEO, says a new approach is needed because legacy banking security solutions that rely on a complex bevy of rules and thresholds are unable to keep pace with the sophisticated schemes that criminals are putting into place today. Instead of trying to program an application to detect signs of fraud, money laundering, and human trafficking by defining it through a series of arbitrary values, ThetaRay uses machine learning to learn what those things look like.

“Our system works more like a human brain,” Gazit says. “When you recognize a person, you don’t go ‘The pattern of the eyes of this person is this number and the distance between the two ears is 10 inches and the size of the nose is 2 inches. No! You just look at the proportions. That’s what our system is doing. It basically ignores the values and looks at relationships, or inter-relationships between parameters.”

In a perfect world, every transaction would be subjected to a rigorous analysis from every possible angle. In this way, every criminal transaction that lacks legitimacy eventually would be identified. Fraud would end, the good guys would keep their $5 trillion per year, and the lowly scammers would seek more gainful employment.

In the real world, spotting the bad transactions is much more difficult. While it is true that every fraudulent transaction carries at least one clue to its malignant nature, being able to identify that one malicious clue hidden amongst hundreds of legitimate features in time to act upon it is extremely hard to do.

5 Billion Points of Light

Gazit frames the challenge in plain terms:

“Say a bank has one billion transactions in a month and every transaction has 500 parameters — name, amount, point of sale, IP address, account number, etc.,” Gazit says. “Now imagine 500 billion dots in the sky — or to be precise, one billion dots in a five-dimensional sky. Then connect all the dots, like full mesh, and then look at all the permutations of connections.”

At a certain point, the possible number of computations becomes a major impediment to solving the problem. Even with the biggest supercomputers, taking a direct approach to crunch these numbers would take many years — even for a single day’s worth of transactions, Gazit says. In this manner, the volume and variety of data poses a major challenge to banks and others who need to identify signs of criminal enterprise in the financial infrastructure.

ThetaRay co-founders Averbuch and Coifman developed novel machine learning algorithms that allow users to find patterns hidden in that morass of high-dimensional data. “What our professors have done is they’ve found a way to do this in real-time,” Gazit says. “It’s not neural network, not deep learning, not linear regressions. More about inner-connection and looking at the [confluence] of the data in hyper-dimensional space.”

The ThetaRay software employs a mix of supervised and unsupervised machine learning algorithms to flesh out signs of the bad guys trying to hide in big data, says Jim Heinzman, the company’s executive vice president of financial services solutions.

“Our system doesn’t require any labels,” Heinzman says. “It doesn’t require any curated data. In fact, it can work with very dirty data. And it’s able to identify unknown unknowns. It will find the things that you don’t know how to train the system to find.”

Cyber NYC

Late last year, ThetaRay was selected to be a component of Cyber NYC, a new endeavor by New York City government leaders to create a cybersecurity hub in the city. According to this November story in the New York Times, ThetaRay will occupy part of a 50,000-square-foot cybersecurity investment hub that’s being built in New York City, where it expects to have upwards of 100 employees.

ThetaRay, which has collected more than $60 million in funding, is ramping up its business. It counts companies like General Electric as customers, and another global bank is using the software to monitor ATM transactions. As bank fines mount over violations of Know Your Customer (KYC) and Anti-money laundering (AML) requirements, financial institutions are motivated to find better solutions than the legacy status quo.

Regulators are beginning to realize that they’ve inadvertently stifled technological innovation through their requirements, Heinzman says. When major Wall Street banks like Citi have to employ 7,000 people just to handle the false positives generated by the legacy fraud detection systems, it’s clear that something isn’t working.

“The regulators, the banks, and the bad guys all know that the existing controls don’t work,” Heinzman says. “They’re not effective. They’re expensive. And it’s the first time in my recollection that the regulators have actually talked about the costs of false positives.”

In December, the Department of Treasury’s Financial Crimes Enforcement Network (FinCEN) issued a joint statement calling for more technological innovation in AML, anti-cybercrime, and KYC initiatives. That statement started the clock ticking on older, legacy approaches.

“When that statement came out,” Heinzman says, “it was very bad day for human traffickers, money lauders, and financial criminals because it made it very clear that everybody understands the problem, that the sophistication of their attacks has grown, and their ability to exploit these legacy controls has enabled them to perpetrate financial crimes in large global institutions, undetected and unhurried. And that dynamic is changing now.”

Machine Learning’s Big Role in the Future of Cybersecurity

Hacker Hunting: Combatting Cybercrooks with Big Data

Applications: Artificial Intelligence, Data Mining, Security

Technologies: Frameworks, Middleware

Sectors: Financial Services

Vendors: ThetaRay

Tags: anti-money laundering, big data, cyber fraud, cybercrime, know your customer, machine learning, security, ThetaRay, unsupervised learning