Follow Datanami:
November 20, 2013

Hadoop and the Encryption Mandate

Alex Woodie

In a perfect world, it would be a no-brainer to encrypt data housed in Hadoop. After all, a Hadoop cluster loaded with petabytes worth of customer data would appear a veritable honey pot of criminal goodness for hackers and other cyber ne’er-do-wells to exploit. However, the mandate to protect Hadoop-based data with strong encryption is not as clear as it might appear to be.

For starters, encryption is not something that is built in or provided by the open source software you can download at hadoop.apache.org. Hadoop was designed, in effect, to be a poor man’s parallel processing engine. Anything that didn’t contribute to the goal of analyzing as much data as quickly as possible simply was not included in the original design.

Now that Hadoop is making its way into the enterprises, security is becoming a major concern. To that end, Hadoop distributors and third-party software vendors are delivering add-on capabilities for Hadoop that deliver enterprise-level security features, such as compatibility with corporation authentication systems, data masking, and strong data encryption.

However, it’s not as easy as just slapping some Linux-supported encryption routines onto your Hadoop cluster. The sheer scale of the data stored in Hadoop requires a careful approach to encryption, lest the process of encrypting and decrypting data as it is accessed grind the cluster’s processing to a halt. In the corporate world of Big Iron, this is why we see vendors adding dedicated processor cards specifically to handle encryption workloads.

One of the third-party vendors taking a whack at the encryption dilemma is Zettaset, which today announced it added data-at-rest encryption capabilities to Zettaset Orchestrator, a software product designed to make Hadoop more secure, more available, better performing, and easier to manage.

Configuration of AES encryption is done from a GUI in Zettaset’s product.

Zettaset says that Orchestrator now supports the application of strong 256-bit Advanced Encryption Standard (AES) encryption algorithms against data as it sits in the HDFS. This will give customers in regulated industries the tools they need to ensure any personally identifiable information stored in their Hadoop clusters are protected, and thereby comply with mandates, such as HIPAA, BSA/AML, and PCI-DSS.

It’s all about making Hadoop just another obedient cog in the security policy machine that enterprises are putting in place (if they haven’t built one already). To that end, Orchestrator’s new encryption functions support KMIP, an encryption key management standard that is starting to take hold in the enterprise. This will allow Zettaset customers to use any KMIP-supported key management product they want, and eliminate concerns about vendor lock-in.

Here’s how Zettaset says the encryption scheme works in its product: “When the data node boots, it authenticates to a key server to retrieve the keys and then uses them to unlock the partitions.  The kernel then will decrypt the data on disk when it is read and encrypt it when it is written.” The key servers can either use KMIP-compliant key servers, or PKCS11 Hardware Security Modules (HSMs) to store the keys. The software only provides encryption for data at rest at this time, although Zettaset says to “stay tuned” about encrypting data in motion.

Zettaset isn’t the only company offering encryption for Hadoop. Dataguise sells a product called DG for Hadoop that can encrypt or mask data in HDFS. Hadoop distributor MapR Technologies has a partnership with Dataguise to sell this product. Another Hadoop encryption tool comes from Voltage Security, which has a partnership with Hortonworks for Voltage’s Format-Preserving Encryption (FPE) technology. Protegrity sells a product called the Big Data Protector that can apply encryption to Hadoop, and Gazzang also touts an encryption solution for Hadoop. IBM makes several encryption and data masking capabilities available to its BigInsights Hadoop users, including those offered via Zaloni, another Hadoop management layer.

Cloudera has implemented some forms of encryption into its Cloudera Distribution for Hadoop (CDH) software, including over-the-wire encryption introduced with CDH version 4.1. But Cloudera advises customers to look toward third-party libraries for on-disk encryption at the Linux file system level. Intel, which has a Hadoop distribution of its own, has taken a different approach that utilizes encryption that’s baked into the latest generation of its Xeon processor to enable Advanced Encryption Standard New Instructions (AES-NI). As Intel discusses here, the AES-NI approach accelerates encryption performance in Hadoop by up to 5.3x and decryption performance by up to 19.8x.

Encryption performance on Hadoop is also a concern for Zettaset, which is currently embroiled in a lawsuit with Intel, whom it accuses of basically stealing and reselling Orchestrator product whole cloth. Zettaset says the Orchestrator encryption feature “has been carefully engineered to eliminate any noticeable impact on Hadoop cluster performance.” It’s tough to know exactly what that means, but the company is not saying there will be no impact on Hadoop. It’s admitting there will be “negligible impact” on Hadoop performance.

Playtime is over.

Zettaset CEO Jim Vogt says the partition-based scheme used in Orchestrator’s encryption capabilities help to minimize the performance hit. “We take a unique approach, which is done below the file system at the kernel level. Because the encryption/decryption is done in the kernel, there is very little latency.  When AES-NI hardware is not available we use software AES in the kernel,” he says.

There is a price to be paid for having good security. The old notion that “you get what you pay for” holds true in the world of security, where IT professionals recognize and accept that there is always a tradeoff between freedom and security. The wild days of young Hadoop clusters running free in the fields are coming to a close. The big elephant is donning a suit and tie and joining the rest of the corporate world.

Related Items:

Closing the Big Data Networking Gap

IBM Taps Zaloni to Ride Herd on Hadoop

Zettaset Puts Hadoop on Lockdown

Datanami