Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


February 16, 2013

Big Data Backs World's Largest Lie Detector


While the ability to accurately predict future crime is still reserved for science fiction, the big data analytics approaches required for such predictions are steadily being shaped and refined.

The impetus behind much of this work isn’t crime prediction so much as massive-scale fraud detection. New methods have emerged to understand criminal networks, identity thieves and fraudsters who are bilking governments, healthcare organizations, insurance companies and others out of billions--and this is becoming a booming business.

According to Jo Prichard, a lead data scientist for LexisNexis Risk Solutions’ HPCC Systems initiative, fraudsters have been pitted against some of the most comprehensive data collection efforts in history. What's unique here is that all of this data has deep context.

The standard data wells stacked with neat rows of names and addresses has been kicked up a notch. It's now possible to crunch personal, family and network histories via  a wide swath of associations. For example, now, when a identity thief files a false tax return under someone else’s name, there are hundreds of variables confirming whether that filer is who he claims to be. The system is, in essence, a lie detector on the global scale.

In this era of the massive graph, determining fraudsters has moved beyond the standard personal investigation to a new “guilt by association" system of flagging. The graph’s long tentacles reach into so many sources that hiding connections and histories becomes almost impossible, especially as fresh data is fed in. Adding to the colossal data fraud detection effort are increasingly smart algorithms that learn as more data is added and adapt accordingly. All of these aspects, run together on fine-tuned high performance systems, are making large-scale fraud almost impossible when the data conditions are right.

Prichard’s company as a subset within LexisNexis is providing massive-scale fraud detection as a data service, built on the company’s long-standing platform, to the government and several large companies to help them understand the roots of fraudulent activity. He told us that currently they have data from over 20 sources feeding into a central well that houses in-depth personal, work-related and asset histories for over 270 million Americans.

 “We can see how people’s lives play out in data, which gives us the this backdrop to understand what we expect to see from people and the more we learn from that at the granular level of detail,” Prichard said. And in the end, the graph that this data creates yields around four billion relationships—a number that will continue to grow with the continuous updates to the system.

This is much like the much-discussed Facebook graph search, but on a far grander and more pervasive scale. Government agencies, insurance companies, banking entities—all of these high-end sectors are in desperate need of fraud fighting tools at incredible scale. What they require is a national lie detector powered by evolving, constant data streams wherein the more data that is fed in, the more granular the analysis will be.

Specifically, he described the process of forming a massive network as akin to starting with fragments of identity. These snippets of a person’s identity are culled from 20,000 public and private sources (everything from deeds, DMV records, credit reports, etc.) and must be cleansed and integrated in a fashion that recognizes the variability of formats and different types of information. From this, it becomes easier to piece together the fragments of identity to complete the puzzle of one person. With that piece in place, the associations (cohabitation, shared assets, etc.) between families and networks can be built—creating a new layer of the puzzle to built yet another employment network on top of.

The possibilities for examining personal relationships is rather staggering. Even for someone with a very common name, the associations across the work, credit, transaction, deed and licensing histories alone helps narrow down the subject. “To do all this we some really smart algorithms,” says Prichard. We have a technology called LexID, which is really a linking technology based on a learning algorithm where the more data you give it, the more it learns, and the better it is able to resolve identity in the end.”

Pritchard describes this smart algorithm as an ever-hungry, multi-armed entity that is constantly stitching together bits of the data fabric. With more data, it becomes simpler for the algorithm to properly decide if someone is John Smith from Pickle, Arkansas versus the other John Smith in the same town.

LexisNexis Risk Solutions, which is known in the big data realm for its distributed computing platform built under the HPCC Systems name, claims that it has built a coveted fraud detection platform on its time-tested technology platform that has served financial and other risk management needs for over a decade. Prichard told us that over time, they’ve been able to refine their graph-based analytics operation beyond the traditional rules-based engines and into a more dynamic system that is based on weighting and guidelines to help snap pieces into place.

LexisNexis Risk Solutions is but one entrant in the race to create a vast global network (a mega-graph) of relationship webs, which come into clearer focus as new iterations of data are processed. While their results are based on data from the United States, eventually such a global lie detector could be built as one entity, continuously snapping up feeds to round out the full extent of personal relationships, assets, history and associations. New data would be neatly plugged into the global graph to weave an intricate, broadly useful web at the international or personal level that only becomes more powerful with each new dash of data.

Other companies offer software services to help companies process their own data on a platform (versus send it to a company like Lex-Nex to process and receive a set of scores or vales back). For instance, last year around this time we talked to SAS, which demonstrated the Visual Analytics platform, which provides granular detail on complex strings of relationships and individuals. IBM and others have software that can enable similar graph-type approaches to the problem of fraud detection.

While government agents used to be the bane of fraudsters everywhere, now it’s data. Armed with the endless stream of feeds from an exhaustive list of sources, it has grown almost impossible to hide. Although there are many who feel that sharing a lot of personal information is harmful (like it’s possible to avoid anyway) Pritchard reminds us that in actuality, the more data we share, the easier it is to pinpoint our own identities in the global graph—making it far more difficult for fraudsters to claim our identity.

Related Articles

HPCC Systems Intros Machine Learning Beta

Breaching the Big Data Barrier in Healthcare

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real-Time

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event