Big Data-Powered Authentication Gives Security a Much-Needed Boost
Trillions of dollars are at stake in the ongoing battle against cybercriminals and fraudsters. Whether it’s Russian cybercriminals hacking into retail chains or a kid in New Jersey filing fraudulent claims, stolen credentials and user IDs often play a major role. But now, big data technologies like Hadoop are helping to prevent unauthorized access, both behind the corporate firewall and in public coffers.
Technological advances are a two-way street when it comes to crime on the Internet. A gang of cyber thieves operating out of Russia is said to have used parallel processing techniques to extract personally identifiable information from thousands of vulnerable servers over the course of the past year. Luckily, the good guys also have powerful tools at their disposal to shrink the attack area and expose the criminal’s signals hiding in all the noise. It’s now a big data arms race between criminals and the rest of us, and the ante just went up.
One company that’s hoping to simultaneously put a cap on internal and external threats is Fortscale, which today announced the availability of a Hadoop-based security monitoring product. The San Francisco-based company uses proprietary machine learning algorithms running in Hadoop to score the security risk of every user in an organization, and then monitor user behavior for significant deviations from that score.
Fortscale’s approach gives companies another defense against evolving cyber threats, says Idan Tendler, the co-founder and CEO of Fortscale. “If you think about the biggest threats out there–the Target breach or even [Edward] Snowden-like insider threats–eventually the user identity will be used as the main vehicle of attack,” Tendler tells Datanami. “You may have so much data already collected about the user…but you don’t have visibility into what the user is doing.”
Fortscale uses the parallel processing power of Hadoop to create and monitor a profile for each user, or group of users, who have access to company’s internal applications. If the user deviates significantly from his historical profile or from a profile of similar peers, then it sets off alarms. This approach requires deep knowledge in enterprise security, which the Fortscale team developed while working for Israeli military and intelligence agencies and security software startups.
It also requires having the flexibility to adapt to the intricacies of individual user behaviors. While security information and event management (SIEM) products have done a lot to centralize the collection of logs, the rules-based tools don’t provide the needed granularity when it comes to monitoring individual user behavior.
“We look at this from the user’s perspective,” Tendler says. “He has a name, a personality, and habits. This user is sloppy or this user is risky or this user tends to have too many permissions and so on. You have to look at the user history and profile his behavior. And only in those methods can you spot odd behavior and can pinpoint malicious users or compromised users whose credentials were stolen.”
Fortscale’s software was tested at a Fortune 100 firm and is now generally available. It’s been proven to run atop Cloudera’s Hadoop distribution but will soon run on Hortonworks too.
Another security software company launching a Hadoop-based security offering this week is Solutionary, which is a subsidiary of NTT Group. The company’s new service offering, called ActiveGuard Security and Compliance Platform, rides atop MapR Technologies Hadoop distribution to provide fast and fine-grain analysis of large amounts of structured and unstructured data.
Using Hadoop enables ActiveGuard to analyze and respond to bigger sets of data than traditional security tools, the company claims. Its able to detect patterns of behavior, anomalous activities, and attack indicators from a variety of sources, including network devices, security appliances, hosts, endpoints, applications, and databases.
“With bad actors increasing the sophistication of their attacks, enterprises are having a difficult time pinpointing the threats and vulnerabilities that pose the largest risk,” Don Gray, Solutionary’s chief security strategist, says. “Because Solutionary has positioned the big data storage in front of our analytics processing, we are able to take advantage of big data analytics in real-time time and accelerated investigation of threats and across clients–in addition to benefiting from the usual big data use-case of providing archiving and post-processing batch analysis.”
Big Data Fraud Fighters
Another big data authentication solution that’s gaining momentum is the High Performance Computing Cluster (HPCC) offering from LexisNexis. The data management company developed the HPCC platform more than 13 years ago–well before Hadoop was a gleam in Doug Cutting’s eye–and today uses it as the heart of various information services.
HPCC holds upwards of 15PB of publicly accessible data that LexisNexis uses to power various offerings, including an identity authentication and fraud detection engine that was recently adopted by the New Jersey Department of Labor and Workforce Development.
The HPCC solution helps the NJDLWD prevent people from fraudulently claiming unemployment insurance benefits they’re not entitled to. The New Jersey system asks HPCC to do two things, says Monty Faidley, director of market planning for LexisNexis. First it asks HPCC to determine if an identity presented in a claim is real. Secondly, it asks HPCC to determine if the identification is owned by the person making the claim.
“We ask that question through a series of knowledge-based authentication questions that are also based on our big data public records solution,” Faidley says. “So we can work back across this identification’s public record history and ask questions like ‘What address did you share with this person six years ago’ or ‘What was the automobile did you have at this address?'”
The very fact that LexisNexis can ask those types of questions (and knows the answer to them) is a testament to the broad reach of its information gathering processes. According to Flavio Villanustre, vice president of infrastructure and security at Lexis Nexis’s HPCC Systems subsidiary, the company collects data on all Americans through more than 10,000 different data sources.
“The sources are quiet diverse,” Villanustre says. “From information that comes from the credit header [from the credit bureaus]…to department of motor vehicle information, utility information, and phone company information.” LexisNexis does not have access to call detail records, but it does know what phone number you had and what color your Yugo was when you were living in Newark in 1992. All that information helps LexisNexis confirm your identity.
“We use a system that we designed and built from scratch and implemented as part of HPCC called Scalable Automated Linking Technology,” Villanustre tells Datanami. “SALT builds all the links using information in the data itself, the semantics of data. Depending on the frequency of the data tokens and the co-location of the data tokens and the occurrence of them in different files, you can determine if multiple records are part of the same individual or a different individual.”
SALT does this automatically and then generates a confidence score. The NJDLWD was able to use that confidence score to determine whether a person filing a claim was likely to be who they said they were, or whether they’re fraudsters looking to steal from the state. Since the NJDLWD started using the HPCC service two years ago, it has stopped more than 600 fraudulent claims. Had they gone through, the state would have paid out more than $4.5 million in fraudulent claims.