Follow Datanami:
May 2, 2017

Masking Technical Complexity in the Security Data Lake


Today’s growing cybersecurity threat demands a sophisticated response, one that increasingly involves the utilization of big data technologies like parallel file systems and machine learning. However, some security experts warn that growing number and complexity of big data security tools could be hindering the cause.

In the UK, a group of data engineers at the security startup Panaseer have devised a novel approach to this big data dilemma. The Hadoop-based platform they developed, called the Panaseer Security Data Lake, aims to centralize and standardize security-related data to bolster cyber intelligence.

Panaseer’s main focus is providing clarity and better situational awareness of security threats to decision-makers, says Charaka Goonatilake, CTO and co-founder of Panaseer.

“There are plenty of tools and a whole range of systems to help security teams find bad things,” Goonatilake tells Datanami, “but there aren’t really enough tools to help the security leadership understand what the risk is.”

A New Security Umbrella

Panaseer hopes to mask the technical complexity of security point tools – such as next-gen security information and event management (SIEM) products, user behavior analytics (UBA) tools, configuration management databases (CMDBs), vulnerability scans, firewalls, a variety of network devices, and antivirus tools – by providing a central repository of security data that can be used to generate reports for C-level executives and directors.

These folks at the top of the corporate food chain tend to be non-technical people who have neither the time nor the inclination to understand the nuances of the cyber threats facing today’s businesses. They also aren’t well-positioned to understand how their chief information security officer (CISO) is responding to existing threats, and preparing for the next one.

While these tools increasingly are necessary for corporations to combat emerging cybersecurity threats, their silo’ed natures and unique data models hamper the ability of customers to view their collective risks in a single light, and thereby coordinate responses.

“We want to try and avoid the proliferation of point solutions that all need their own copy of the same database,” Goonatilake says. “It’s all the same data. All these use case can be serviced by the same data set.”

Standard Data Model

The Panaseer Security Data lake is not just another amorphous piece of plumbing or some soon-to-be-forgotten metadata participant.  It’s an application in its own right, providing BI and reporting solutions that just happen to use data from a variety of security tool end-points.

But in order to accomplish that task, Panaseer had to get down into the technical weeds and do the hard, thankless work of creating a standard data model.

“We’re very heavily involved in coming up with standard data models that allow us to write analyses in way that is agnostic of where the data came from,” Goonatilake says.

One of the big problems that many organizations face today is they’re consuming data from lots of different security products, he says. “It’s important for us to be able to have a single set of analytics that can be deployed regardless of where the data is coming from,” he adds.

Panaseer isn’t the only big data software company that’s identified the need for a standard security data model. Both Hortonworks and Cloudera are backing similar projects to create data models for security data held in Hadoop. That includes Apache Metron, which is backed by Hortonworks, and Apache Spot, which is backed by Cloudera and was formerly called ONI.

While Goonatilake applauds the work of the Hadoop distributors, he says that neither Metron nor Spot are mature enough at this time for commercial deployments. And the pace of development in the two projects is too slow at this point for Panaseer to throw its weight behind them.

Getting the data model right will be critical for Panaseer and its customers to succeed, Goonatilake says. “Our data models will continue to remain a core part of our platform because all of our analytics depend on it,” he says. “We’ve actually put in the effort to understand [and clean] the raw data sources and map it into something that’s rich and high quality for our customer to use.”

While Goonatilake says he hopes Metron and Spot eventually succeed, he says their data models come up short in several areas at the moment, including the volatility and threat data models.

“Some of these data models, and the data fields they capture, don’t exist in Spot and Metron, and even if they do exist, they don’t capture the richness of data that we need,” he says. “I think that’s partly due to the fact that those [projects] haven’t been developed and driven by specific and broad sets of use cases.”

Sitting on Hadoop

While Panaseer isn’t waiting for Cloudera and Hortonworks to accelerate the maturation of the Spot and Metron projects, the company is moving ahead with solid plans to utilize Hadoop as the core understanding platform upon which to build its security solution.

“We’ve got a single platform that’s meant to fulfil the needs of lots of lots of stakeholders with varying levels of skill,” Goonatilake says. “I can’t really think of an equivalent platform that can work in the enterprise.”

Hadoop provides Panaseer with a general purpose big data platform that has the flexibility to handle a range of data and processing. “When you’re looking at cybersecurity data sets, they tend to range in complexity and volume. You can go from structured to unstructured.  You can go from relatively small data sets like asset or HR database, all the way up to very high volumes of data, like firewall logs or proxy logs.”

And the type of analysis that Panaseer wants to do on it also varies in complexity. “We might want to go from very basic lookups to freetext search type of lead cases, to SQL type queries to more free-form programmatic analytics to machine learning algorithms,” he says. “And when we’re trying to combine these requirements, both from a data side as well as the analysis side, we need some kind of a general purpose big data platform that can enable these requirements, which is why we have invested on building on top of Hadoop.”

Like most other big data software companies, Panaseer has availed itself to other members of the broader Hadoop stack, including Apache Kafka for ingesting security data from a range of sources; Apache Spark for writing SQL and machine learning models; and Apache HBase for serving reports.

So far, the company has attracted about 10 customers, most of which are banks and financial services companies in the UK and the US. The company recently established an office in New York City, which will give it the proximity to help some of the biggest banks in the world build security data lakes of their own.

The company has the backing of the UK government, including Prime Minister David Cameron. In fact, Panaseer accompanied Cameron to Washington DC last year as part of a cybersecurity delegation from the UK. “Panaseer’s application of advanced data science techniques to help secure the enterprise is an excellent example of British cyber innovation with global market potential,” says Andy Williams, the UK government’s cyber envoy to the US.

Gartner has pegged the cybersecurity market as a $71 billion business, with rapid growth foreseen in the coming years. Panseer seems well positioned to capitalize on that trend.

Related Items:

How ‘Purple Rain’ Bolsters Security Intelligence for Capital One

Super Scalable SIEMs Set to Tackle Big Security Challenges

Behavior Analytics Looks to Leapfrog SIEMs