Follow Datanami:
December 3, 2014

Stomping Out Criminal Scams with Hadoop

The growing technical sophistication of criminals is leading to an arms race to see who can scale more quickly to outmaneuver the other side. Cybercriminals are increasingly adopting hyperscale techniques to help them perpetrate fraud faster and more efficiently than ever before. That’s led the good guys to seek new capabilities of their own, including using Hadoop.

One of the promising startups that’s looking to use the power of Hadoop and big data to put the kibosh on fraudsters and scam artists is Argyle Data. The San Mateo, California company was founded 18 months ago with the goal of developing a Hadoop-based application that uses machine learning technology to detect patterns in real-time log data that are good indicators of criminal activity.

Argyle is focused on using its big data technology and expertise to help telecommunications firms and banks analyze petabytes worth of data to flesh out hard-to-detect scams as quickly as possible. Its software is based on open source frameworks, including Hadoop, the Accumulo key-value store based on Google‘s BigTable, and the Presto distributed SQL engine developed by Facebook, to quickly chew through petabytes of log data, including packet data extracted by network probes. On top of these core components, the company developed its own machine learning algorithms and complex analytics to extract actionable information from a massive amount of fast-moving data.argyle logo

The combination of these big data technologies and techniques gives Argyle’s customers an edge against cybercriminal rings that are pulling in tens of billions of dollars per year, says Argyle Data CEO Tom Ryan.

“Fraud’s been around a long time and there are a lot of systems in place to fight it,” Ryan tells Datanami. “But they tend to operate in batch and they then tend to be very specialized and focused and they tend not to leverage big data or be able to handle multi-petabytes of input.”

With Argyle’s software running on a Hadoop cluster, banks and phone companies have a better shot at detecting crimes as they occur in real time. Argyle is particularly good at sensing phone scams, such as the call-back scam, where fraudsters set up a premium phone number and trick people into calling it, often by dialing victims but only letting the phone ring once. When the victim calls back out of curiosity, they’re immediately hit with a $5 phone bill. Thanks to high performance computers and other illegal power-ups (like hacking PBXs), criminals can rake in serious dough very quickly.

Fraud sucks 5 to 10 percent of revenue out of phone companies (and eventually phone companies’ customers) and costs us $46 billion per year globally, according to Ryan. “What’s happening is fraud is modern day terrorism,” he says. “It’s the new generation of organized crime. They’re very sophisticated, they have a lot money, and in some cases they’re more technically astute than the IT organizations they’re attacking.”

In financial services, Argyle is helping to stop everything from debit and credit card fraud and trade surveillance to account takeovers and insider fraud. These are the key use cases that are emerging as high priorities and ones that are well suited for the stack,” says Ryan, who adds that Argyle is currently in talks with two of the three largest retail banks.

argyle diagram

The Argyle architecture

Hadoop gives Argyle and its customers an edge against these types of criminals. “We use Hadoop to do real time ingestion of packets at really high scale,” says Argyle’s chief marketing officer Ian Howell, “Lincoln Labs, for example, ran a benchmark using Accumulo and found it can do 100 million ingests per second. That’s really powerful. We can ingest, enrich, and feature-ize data in real-time to make machine learning very effective. That’s one of the reasons we chose Accumulo.”

The “secret sauce” behind Argyle’s approach lies in the machine learning algorithms and complex analytics it uses to detect anomalous behavior. The nature of fraud is constantly changing, Ryan says, and it’s necessary to adapt quickly to changing patterns of fraud.

The algorithms were also infused with so-called “adversarial learning” capabilities that are designed to break through a criminal’s camouflage. “Criminals don’t behave like normal people, even when they try to imposter someone,” Ryan says. “So if a 22 year old in Eastern Europe is trying to imposter a 65-year-old in Alabama, he’s not going to get it right. So we look for adversarial behaviors.”

Argyle uses Hadoop to help phone companies stop call-back fraud

Argyle uses Hadoop to help phone companies stop call-back fraud

Argyle isn’t the only startup looking to turn Hadoop into a big-data fraud fighting platform. But the company insists that its mix of deep packet inspection, powerful key-value data store, and distributed SQL query capabilities makes it unique and gives it a scalability advantage over other approaches.

The company recently cleared $4.5 million in venture funding and unveiled a partnership with Hadoop distributor Hortonworks last week. John Kreisa, vice president of strategic marketing at Hortonworks, praised Argyle’s support for the Apache Ambari framework and “its commitment to providing simple to roll out, data intensive Hadoop based applications.”

Related Items:

Fighting Crime with Big Data

Fighting Telephone Fraud with Data Analytics

Eight Ways Analytics Powers Fraud Detection

IBM Flushes Out Fraud with Big Data Analytics

Datanami