Follow Datanami:
October 28, 2013

MapR Unveils Strong Authentication for Hadoop

Alesx Woodie

One of the knocks against using Hadoop in an enterprise setting has been the relatively low level of security in Hadoop and the surrounding framework. MapR Technologies today chipped away at that concern by unveiling their ideas for strong user and program authentication capabilities in its Hadoop distribution.

MapR says the “native” authentication methods that it just released into beta will protect its Hadoop customers against several related security threats, including user impersonation, rogue daemons, and malicious remote procedure calls. By implementing authentication backed by 256-bit AES and SHA cryptography, it gives customers the assurance that any person or program requesting access to the Hadoop cluster and its resources are authorized to do so.

MapR is using two approaches to implementing native, wire-level authentication, including one based on Kerberos and one based on the Linux Pluggable Authentication Modules (PAM). Both approaches involve checking credentials against user directories, including Microsoft Active Directory or LDAP.

Customers that already have Kerberos installed in their environments will likely use the Kerberos option that MapR is making available in a future, undisclosed released of its software. Kerberos uses the concept of security “tickets” and a trusted third party to provide authentication over unsecure networks. Its use of “mutual authentication, in which both the client and the server authenticate each other’s tickets against the third party, requires additional infrastructure. Kerberos also has the advantage of enabling single sign on (SSO), which has made it an authentication staple in many large enterprises.

But MapR provides a secondary authentication mechanism for those who aren’t already invested in a Kerberos infrastructure. By using PAM, which is a bit more straightforward than Kerberos and doesn’t require additional infrastructure, MapR is enabling customers to get started with strong authentication.

Either approach will give MapR customers the security they need to ensure their Hadoop clusters and the data they contain remain safe from potential threats, says MapR Technologies chief marketing officer Jack Norris.

“The communication to and across cluster is all secured at the wire level,” he tells Datanami. “So it’s strong wire level authentication.  It’s native. It works out of the box with Hadoop. It has flexibility in terms of how that certificate works.  Some government agencies want that based on a badge or to incorporate biometrics. That’ possible with this native authentication.”

The flexibility to choose either Kerberos or to use MapR’s out-of-the-box, PAM-based authentication mechanism will appeal to customers who are building and growing their Hadoop environments, Norris says.

“There are so many applications for Hadoop that initially might not require this stringent security,” he says. “But as it continues to expand, and as some of these apps get into sensitive areas or sensitive data…then this is absolutely required for those uses cases and applications.  It makes it much easier and faster if you don’t have to wait until a full Kerberos environment is deployed before you can pursue that use case.”

Keeping the whole process easy to use was a requirement for MapR, whether or not a customer chooses Kerberos or the out-of-the-box PAM method. “If security is complicated to deploy and integrate, so much so that people don’t use it, then it’s not very secure,” Norris says. “As they were looking at higher level capabilities–access control lists [ACLs] and file permissions, etc.– those were only as good as the underlying authentication. And if it was easy to spoof and impersonate a user, then it didn’t matter how fine-grained your access control was. You need to start with the authentication.”

Under the new security scheme, a proper user key (provisioned upon successful authentication via user name and password) can be required for user operations, including file reads and writes, database manipulations, and MapReduce job submissions. Any node-to-node communication that takes place within a cluster, including mirroring, can be protected by requiring the nodes to authenticate each other with the correct cluster keys. The authentication scheme extends into other Hadoop components, including ecosystem projects like Hive and Drill, MapR says.

Norris says it was a fairly big task to implement these two authentication methods, but that it was made easier by the work MapR has done to enable a random read write, POSIX-compliant data layer, which is largely how MapR differentiates itself from competitors. “It was basically an enabling technology for this native authentication, allowing us to easily update keys and permissions and basically have this work across the cluster,” he says. “It would be a lot more difficult if we didn’t have that underlying capability.”

There are other approaches to authentication. The Apache Sentry project, for instance, is dedicated to providing fine-grained, role-based authentication into Hadoop resources, while Apache Knox aims to help with the implementation of secure Hadoop clusters. By going off the Apache reservation, has MapR built itself into a proprietary corner? No, Norris says.

“We do the things with the architecture that make sense, without compromising the open source comments or changing compatibility,” he says. “This was another area that, by doing this at the lower levels, we were able to integrate in and provide a lot of value.”

Related Items:

Hadoop Version 2: One Step Closer to the Big Data Goal

MapR Gooses HBase Performance in Pursuit of Lightweight OLTP

Oracle Addresses Hadoop Security with Big Data Appliance

Datanami