Follow Datanami:
May 2, 2016

How Big Data Improves Logging and Compliance

Joe Goldberg

Compliance has never been easy. Organizations have to meet a myriad of external regulations, frameworks, and internal mandates such as PCI, HIPAA, FISMA, NERC, ISO and the EU Data Directive, many of which have a long list of required technical controls. Many large organizations have five or more regulations or mandates they must comply with. Organizations face many challenges to meet these requirements including experiencing difficulty complying with the technical controls required by the regulations and mandates, especially:

  • Requirements around logging, monitoring and analyzing security events for incident detection and investigations, especially when logs need to be retained for months or years;
  • Measuring and demonstrating compliance with all the various technical controls.

Additionally, when companies are audited, they may receive ad-hoc requests for event data or custom reports, which can be challenging if the proper systems aren’t in place. When these requests come in, you cannot simply use the user interfaces of multiple security products to prove compliance, because the data is siloed and often times is only stored for a limited time. To help, organizations often turn to Security Information and Event Management (SIEM) software, which can centrally collect event and log data from security devices. In turn, these logs can be harnessed for correlations and rules to detect threats, after-the-fact incident investigations and response, and for compliance reporting.headache_man

However, traditional SIEMs are coming up short when it comes to dealing with the sheer volume of event data and log files needed to unravel today’s compliance knots. Traditional SIEMs simply can’t keep up because they often uses a single, relational datastore on the back-end, which is a problem for several reasons:

  • The fixed schema of their datastore means they cannot index all logs and security events. They usually have to normalize raw data to fit their datastore schema, which means valuable log data is lost that otherwise might be needed for threat detection or investigation.
  • A single datastore is a point of failure and chokepoint, which prevents scale and speed.

Another limitation of traditional SIEMs are their rigid user interface with inflexible search and report building capabilities. This hampers the ability to create single reports that span multiple regulations or run customer searches to satisfy an ad-hoc auditor request.

gradesI worked with a Fortune 500 retailer that had to comply with PCI, HIPAA, SOX, GLBA and internal mandates. They initially had no centralized logging and to measure technical controls they had to painfully log into multiple user interfaces. To perform incident investigations, they often had to manually Secure Socket Shell (SSH) into physical servers one at at a time to retrieve logs and then manually try to correlate threat activity across these logs.

The company was struggling to comply and their auditors were not happy with them. Ouch. So they bought a traditional SIEM and things became marginally better, because now all the logs needed were in one place for security and compliance use cases. However, they found that the SIEM struggled to get in “non-standard” log formats from custom applications, searches were painfully slow to run even against 10 GB of log data a day, and it was difficult to create the custom reports their auditors were asking for. So still ouch.


Here Comes the Good News – Big Data

If this sounds familiar to you, the good news is that there is an answer for you: big data. Big data solutions have technology that can handle the volume, velocity and variety of log and machine data you need for compliance. Big data uses a flat file data store, not a relational datastore or database, it scales horizontally on commodity hardware and it uses distributed Google-like search for scale and speed. Big data software also uses flexible and powerful capabilities that can be used for SIEM use cases such as correlations, alerting and reporting. As a result, big data solutions don’t suffer from the limitations of traditional solutions:

  • SpeedFlat file data store means the big data solution can index all the original, raw data and make it all available for security or compliance purposes.
  • Distributed search architecture means fast scale and speed when it comes to data ingestion, searching, reporting and alerting.
  • Flexible UI and search/reporting capabilities means users can easily create reports and run searches needed to show compliance status. Also, big data solutions can easily pivot through raw logs to facilitate incident investigations and response.

Big data technology has the capabilities required of an enterprise-ready compliance/logging/SIEM solution including real-time searching and alerting, roles-based access control, data hashing to demonstrate logs have not been tampered with and granular control on how long to retain logs. From many vendors, big data solutions also have pre-built searches and reports for various regulations.

The result is that big data technologies can scale to meet event logging, monitoring, retention requirements and can automatically measure the overall effectiveness and status of essentially any technical control in real time. Basically, these technologies represent one solution to cover many regulations or mandates that also lead to a strong ROI. Simply put, by using big data as a “next-generation SIEM,” companies can implement better, faster and lower-cost compliance.

control centerSo let’s go back to that Fortune 500 retailer I mentioned earlier. They ditched their traditional SIEM for a big data solution and saw immediate benefits. Now, they can collect all of the raw data from all of their data sources and make the data available for custom reports or any visualization to show compliance with any technical control. Ad-hoc queries against terabytes of logs needed to satisfy an auditor ad-hoc request, or for an incident investigation, now only takes seconds. And these queries are against a daily volume of logs over a terabyte a day.
Consequently, their lives are much better now and their auditors couldn’t be happier. Good news all around! So good in fact that the retailer made the big data solution a key component of their Security Operations Center (SOC). They even started sending their big data solution logs from new data sources for different use cases, including IT Operations monitoring and trouble shooting, and even fraud detection. In their words, big data has become their “data fabric” connecting the entire enterprise.

So next time you find yourself struggling with event logging or compliance, give big data a look!

Joe Goldberg

About the author:  Joe Goldberg is the Security/Compliance/Anti-Fraud Evangelist at Splunk. Goldberg’s responsibilities at Splunk include technical product marketing and evangelism for security and compliance use cases. He is also a published contributor for Wired Magazine, Dark Reading and SC Magazine. Prior to Splunk, he did Technical Product Marketing for the Data Loss Prevention product at Symantec. Previously, he did product marketing for both VMware and Sun Microsystems. He has also worked in the financial services industry doing venture capital and corporate development. He has an M.B.A. from the Wharton School at U of Pennsylvania and a business degree from the Haas School of Business at UC Berkeley.

Related Items:

Why Machine Learning Is Our Last Hope for Cybersecurity

Super Scalable SIEMs Set to Tackle Big Security Challenges