Follow Datanami:
January 13, 2014

Fighting Telephone Fraud with Data Analytics

Alex Woodie

The call came in just as Sheila B. started her shift at the bank’s call center. The customer sounded distressed. He was having money problems, he said, and he needed to tap his home equity line of credit. Even though he’d forgotten his account number and PIN, he knew his social security number, so Sheila B. completed the $50,000 transfer as requested. A day later, she learned she’d been duped by a sophisticated ring of fraudsters operating out of Slovenia, but by then the money was gone.

Tales like Sheila B.’s are common in the financial world. One out of every 2,500 calls that are placed to call centers of major financial companies is a fraud call, and phone fraud accounts for $1.8 billion in losses each year, according to industry estimates. It’s virtually impossible to train customer service representatives (CSRs) like Sheila B. to deal with fraud calls. Their jobs are to be polite and helpful to customers, not to treat them with suspicion.

A company called Pindrop Security has come up with a different way to fight the fraud. Instead of erecting more security walls and authentication mechanisms–which will slow fraudsters, but not stop them–Pindrop identifies whether a call is legitimate or fraudulent based on the audio signal from the call itself. It’s an innovative use of big data analytics technology that has the potential to reduce the instances of telephone fraud.

The approach is called “phoneprinting,” says Pindrop CEO Vijay Balasubramaniyan. “At a fundamental level what we’re doing is looking for characteristics that your phone device and the networks inadvertently introduce into the audio of the call,” he tells Datanami. “Currently there are about 147 features that we extract from the audio of the call, and we use this to essentially identify the device making the call.”

Blueprinting the World’s Phones

The phoneprinting technology uses machine learning algorithms to analyze and compare each and every call coming into a call center against those 147 features. Pindrop needs about 15 seconds of audio (and another three seconds of processing time) to make its determination on each of those calls. When it identifies a call as having a high likelihood of being a fraud call–often due to the geographical location that the call originated from–the call will be routed to a special team of highly trained CSRs, or the customer will complete the call but block any money transfers.

Some of the 147 features that Pindrop looks at are obvious ones. For example, if a caller has blocked their Caller ID, that’s a big red flag. Likewise, if the Caller ID shows a call originating from a phone booth in Lagos, Nigeria, that also would generate an alert. But there are many other features that are not so obvious that tell Pindrop the signal is being manipulated in some way.

Each telephone carrier treats their signal in different ways, and Pindrop can use this data to identify where a call has originated. Every international call goes over a VOIP network at some point, and the level of voice encoding (usually 10 to 30 milliseconds) is a particularly rich source of information for Pindrop to mine. Packet and frame loss also help identify networks, as does the artificial background noise that some carriers insert into the audio signal, that “swish” sound that helps prevent people from thinking they’ve been disconnected.

The company claims to be able to identify the source of a call down to an area the size of France. “What we’re doing is using our Phoneprinting technology to blueprint the entire world based on its audio characteristics,” Balasubramaniyan says. “These fraudsters are coming from really weird places, like Nigeria, where the telecommunications infrastructure is not very strong, so the artifacts are very clear.”

Big Data Hurdles

Pindrop faced a big data challenge due to the amount of data being collected and the speed at which a decision must be made. Some of the large call centers that it works with handle more than 1 billion calls every year. Being able to handle that scale and continue to adapt the fraud models to keep one step ahead of the criminals required some fancy footwork on the part of Balasubramaniyan and his team of developers.

Pindrop tried a range of technologies, including Hadoop, before settling on MySQL to store the copy of the phone prints database that runs in an appliance at the client’s call center. “MySQL provides significant ACID properties that we need for enterprise customers,” Balasubramaniyan says. “We’ve done a lot of clever things with MySQL in order to handle the massive amounts of data. But we haven’t made the transition to NoSQL yet.”

Each call occupies about 30 KB of space after the characteristic are extracted from the audio. Then, the algorithms have about three seconds to match that file against the local database, which may be in the 10 to 30 TB range. “We use aggressive caching technology like, memcached and things like that, to be able to do this really, really quickly,” Balasubramaniyan says.

The size of the models also poses a problem. “Essentially what we’re trying to do is figure out, in a multi-dimensional space, where the fraudsters lie and where the legitimate phone calls lie,” he says. “If you could load up all that info in memory, it becomes a simpler problem. But in our case, we can’t. So we have to create partial models and then collate the models together.”

Keeping the models and machine learning algorithms current within the limits of time and space also requires some tricks of the trade. “Let’s assume you have 10 million calls, and you get the 10 million and first call. You can’t retrain the entire model based on that 10,000,001 call,”

CEO and co-founder Vijay Balasubramaniyan developed the technology behind Pindrop’s offering while at Georgia Tech.

says. “What you have to do is incremental classification. You have to figure out clever things that allow you to make that incremental classification.”

Down with Evil Fraudsters

Balasubramaniyan developed the phone printing technology as a graduate student at in the Computer Science school at Georgia Tech, and founded Pindrop to bring the technology to market. The company, which received $11 million in Series A venture funding earlier this year, is now in use at two of the largest brokerages in the US and two of the largest banks, says Matt Anthony, the company’s vice president of marketing.

Financial firms that are tired of absorbing the losses associated with fraud no longer have to consider it just a cost of doing business. “This is a whole lot of new data on phone calls that we haven’t had before,” Anthony says. “We’re finding that it’s pretty easy to tell the difference between a good guy and a bad guy when you have that data.”

As companies tighten the security of their e-commerce websites, fraudsters are increasingly turning to phones to perpetrate their criminal activities. Successful phone fraudsters today are using a combination of resources to break their targets, including data gleaned through identity theft (such as the 70 million customer records Target recently lost) and social hacking skills.

In many cases, a company’s toll-free telephone line may be its weakest link, with fortifications tough to build. “I feel for these customer service reps, because what they’re getting measured on is customer satisfaction, and they’re dealing with people who might be frustrated or have money problems,” Anthony says. “If they can alert them before the call comes in and say, ‘There are 10 things that are weird about this call,’ then they have a fighting chance.”

Pindrop looks poised to capitalize on big data technologies and make a decent living by keeping fraudsters out of bank accounts. But for Balasubramaniyan, keeping the data models and algorithms up to date is a labor of love, because he knows it’s being used to keep criminals from stealing other people’s money.

Love is a motivator, and so is hate. Anthony related the tale of one of Pindrop’s customers, who took the unusual step of inviting his biggest competitor into his office to show them the Pindrop technology. “Somebody asked him, ‘Why would you do this for your biggest competitor?'” Anthony says. “And the guy responded, ‘Everybody hates their competitor. But I hate fraudsters a lot more than I hate my competitor.'”

Related Items:

IBM Makes A $1-Billion Bet To Make Watson A Business

Universities Roll Out New Big Data Programs

Pivotal Helps NYSE with Multi-Petabyte Problem

Datanami