Follow Datanami:
August 6, 2015

The Importance of Securing Big Data–The Numbers Don’t Lie

Cynthia Leonard

If you are using any sort of Big Data in your enterprise environment, you are no doubt concerned about securing that data. As more and more organizations are seeing the potential and utilizing the power of Big Data, they are using Hadoop to easily process these large data sets. As these Hadoop projects are being deployed, businesses are realizing that they must protect sensitive customer data, as well as partner and internal information, while adhering to expanding compliance requirements.

One of the great things about a Hadoop deployment is that it processes extremely large amounts of data from multiple sources and multiple enterprise systems in real-time for analytics and new business insights – however this also brings with it varying (or unknown) data protection requirements.  As these multiple types of data are combined together and stored in a Hadoop data lake, the information is then accessed by many different users with varying analytic needs.  Businesses face the risk of even further reduced control of the data if Hadoop clusters are deployed in a cloud environment.

At the February 2015 Strata + Hadoop World Conference in San Jose, HP Security Voltage conducted an anonymous survey querying attendees about their protection of sensitive data, and their current approach to securing data in Hadoop. With almost 200 attendees participating, the results are revealing and show that protecting sensitive data in Hadoop is a top-of-mind concern for over 70% of the survey participants.

We found that:

  • 70% of the survey participants said their business currently uses some personal dataform of sensitive data such as PCI (payment card information), PII (personal identity information) or PHI (protected health information)
  • When it comes to protecting that sensitive data, 52% said they are protecting that sensitive data using encryption and 21% are protecting with tokenization
  • 67% of the survey attendees said they were currently planning Big Data projects involving sensitive data
  • When asked what kind of sensitive data they need to secure for their Big Data projects, 44% said they need to secure credit card numbers, 41% need to secure social security numbers, 66% need to secure names and addresses and 53% need to secure date of birth
  • When asked what Hadoop distribution they use, many are using more than one distribution: 46% are using Cloudera, 29% are using Hortonworks, 15% are using MapR, 8% are using IBM and 32% are using a proprietary tool

Interestingly, at the previous Strata + Hadoop World New York in October 2014, we asked the same questions of those attendees and there was a noticeable difference in the types of sensitive data their projects needed to secure. Perhaps that was due to the latest headlines at the time about data breaches showcasing the need to secure sensitive data. Compared to the attendees’ answers from Strata + Hadoop World New York, 41% of respondents at San Jose stated they need to secure social security numbers vs. 27% at Strata NYC and 66% of respondents at San Jose stated they needed to secure names and addresses vs. 45% at Strata NYC.

Adding to these responses, more data was gathered when HP Security Voltage participated in the April 2015 RSA Conference in San Francisco. The annual RSA Conference is the world’s leading information security conference and this year’s theme was, “Where the World Talks Security.”

Over the duration of the conference, we conducted another anonymous survey querying attendees about their protection of sensitive data, and their current approach to securing data in the cloud. With 350 attendees participating, the results showed that being compromised in a cloud-based app was a top-of-mind concern for 80% of the survey participants. Looking to the future, 56% of the respondents said they were currently planning Big Data projects involving sensitive data.

Best Practices to Secure Big Data in Hadoop

We strongly advocate a data-centric approach that protects sensitive data end-to-end from the moment of capture, as it is processed, used, and stored across a variety of devices, operating Hadoop_copsystems, databases, and applications. This data-centric approach helps enterprises neutralize data breaches by rendering data valueless to attackers, de-identifying data through encryption, tokenization and data masking.

Here are some steps you can take to secure your Big Data in Hadoop:

  • First, take an inventory of all the data you intend to store in your Hadoop environment and identify the sensitive data.
  • Next, perform threat modeling on sensitive data. The goal of threat modeling is to identify the potential vulnerabilities of at-risk data and to know how the data could be used against you if stolen.
  • Then, identify the business-critical values within sensitive data. It’s no good to make the data secure if the security tactic also neutralizes its business value. After that, apply tokenization and format-preserving encryption on data as it is ingested. Format-preserving technologies enable the majority of your analytics to be performed directly on the de-identified data, securing data-in-motion and data-in-use. Data that has been de-identified has no value to a hacker and is now rendered useless if a breach occurs.
  • Lastly, provide data-at-rest encryption throughout the Hadoop cluster. When hard drives age out of the system and need replacing, encryption of data-at-rest means you won’t have to worry about what could be found on a discarded drive once it has left your control.

Using a data-centric security strategy as you plan and implement big data projects or Hadoop deployments can neutralize the effects of damaging data breaches and help ensure attackers will glean nothing from attempts to breach Hadoop in the enterprise.

 

About the author: Cynthia Leonard is Marketing Program Manager for HP Cynthia.LeonardSecurity Voltage. She joined HP Security Voltage in April 2015 as marketing manager for Communications and Brand. Cynthia provides writing, editing and content creation expertise across all of HP Security Voltage’s marketing and internal communications channels.

Related Items:

Crypto Tools Target Hadoop Security Gaps

The Big Data Security Gap: Protecting the Hadoop Cluster

Big Data Breach: Security Concerns Still Shadow Hadoop

 

Datanami