Follow Datanami:
September 16, 2013

The Big Data Security Gap: Protecting the Hadoop Cluster

The Big Data Security Gap: Protecting the Hadoop Cluster

As Big Data, and more specifically Hadoop, has begun to take hold in the enterprise, it’s not surprising that a discussion on Big Data security would soon follow. Data security is not only a best practice, but an imperative for any business that stores or transacts sensitive information such as financial, HR, or healthcare records, for example.

Hadoop enables the distributed processing of large data sets across clusters of computers, but its approach presents a unique set of security challenges that many enterprise organizations aren’t equipped to handle. Open source approaches to securing Hadoop are still in their infancy.  But Big Data still needs the same security controls that we have been applying to “regular” data for many years in order to avoid a data breach that could compromise sensitive data and tarnish an organization’s reputation and brand.

Here’s what every organization needs to know about Hadoop security:

Hadoop is Not Inherently Secure

Hadoop, like many open source technologies such as UNIX and TCP/IP, was not created with security in mind. The open source Hadoop community supports some security features through the current implementation of Kerberos (a method of secure authentication), the use of firewalls and basic HDFS permissions, but even Kerberos is not a mandatory requirement for a Hadoop cluster, making it possible to run entire clusters without deploying any security. Hadoop may be a new technology, but from a security perspective it is lagging behind the security requirements of today’s enterprise.

Distributed Computing Represents a Different Security Challenge

In a Hadoop cluster environment, data is processed wherever resources are available, supported by massively parallel computation. This is quite different from the centralized architecture of a traditional relational datastore. Hadoop’s distributed architecture creates an environment that is highly vulnerable to attack at multiple points, as opposed to the centralized repositories that are monolithic and easier to secure. Data within Hadoop clusters is fluid, with multiple copies moving to and from different nodes to ensure redundancy and resiliency. Data can also be sliced into fragments that are shared across multiple servers. These characteristics add new complexity, and demand a different approach to data security.

Traditional Security Tools Haven’t Caught Up

The massive volume, velocity, and variety of data are overwhelming to existing security technologies, which were not designed and built with Big Data in mind. Traditional perimeter security technologies like firewalls or IPS are often problematic when deployed within a distributed file system like Hadoop. Hadoop is not a single technology, but an entire eco-system of applications including Hive, HBase, Zookeeper, Oozie, and Job Tracker.  Each of these applications requires hardening.  To add security capabilities into a Big Data environment, functions need to scale with the data. Traditional data security tools that were built to support siloed data center environments simply can’t keep up with distributed, petabyte volumes of data

Popular Distributions Aren’t Incented to Add Security

Hadoop distributions from Cloudera, Hortonworks, and others lack what most enterprises would consider adequate security controls.   Because popular Hadoop distributions operate on a professional services business model – and adding security would mean the development and sale of proprietary software or hardware – they aren’t incented to address security at a cluster level. Even if they were inclined to build security offerings for their distributions, any projects would have to first be agreed upon and approved by the Apache Hadoop community.  Aside from service-level authorization and web proxy capabilities, no facilities are available to protect data stores, applications, or core Hadoop features. All Hadoop-based big data installations are built on the web services model, with few or no facilities for countering common web threats.

Security Must Move Closer to the Data

A Forrester report, the “Future of Data Security and Privacy: Controlling Big Data”, observes that security professionals apply most controls at the very edges of the network.  However, if attackers penetrate your perimeter, they will have full and unrestricted access to your big data.  Any organization deploying Hadoop should strive to bring security controls as close to the data as possible, ideally embedded within the cluster itself, as well as deploy commercially available solutions that provide fine-grained authentication and access controls, policy enforcement, and which include features which support regulatory compliance such as centralized configuration and logging. While many organizations are still searching for the right mix of security tools and practices for their Big Data deployments, security shouldn’t be considered a roadblock, but rather a crucial component of Hadoop, and a way to accelerate its adoption within the enterprise.

Zettaset Orchestrator: Purpose-Built Security Solution for Hadoop and Big Data

Only a new approach that addresses the unique architecture of distributed computing can meet the security requirements of the enterprise data center and the Hadoop cluster environment. 

Zettaset Orchestrator™ provides an enterprise-class security solution for big data that is embedded in the data cluster itself, moving security as close to the data as possible, and providing protection that perimeter security devices such as firewalls cannot deliver.  At the same time, Orchestrator addresses the security gaps that open-source solutions typically ignore, with a comprehensive big data management solution which is hardened to address policy, compliance, access control and risk management within the Hadoop cluster environment.

Orchestrator addresses the critical security gaps that exist in today’s distributed big data environment with these capabilities:

  • Fine-grained Access Control – Orchestrator significantly improves the user authentication process with fine-grained Role-based Access Control (RBAC), which provides administrators with the flexibility to assign customized permissions for any user, in any organizational unit.
  • Policy Management – Orchestrator simplifies the integration of Hadoop clusters into an organization’s existing security policy framework with seamless support for LDAP and AD.
  • Compliance Support – Orchestrator enables Hadoop clusters to meet compliance requirements for reporting and forensics by providing centralized configuration management, logging, and auditing.  This also enhances security by maintaining tight control of ingress and egress points in the cluster and history of access to data.

Zettaset Orchestrator is the only solution that has been specifically designed to meet the security requirements of the distributed architectures which predominate in big data and Hadoop environments.  Orchestrator creates a security wrapper around any Hadoop distribution and distributed computing environment, making it enterprise-ready.  Orchestrator is evolving to include even more enterprise security capabilities that simply aren’t available from the open source community.  With Orchestrator, organizations can now confidently deploy Hadoop in data center environments where security and compliance is a business imperative. 

Datanami