Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
Cloudera Adds a Sentry to Their Stack
Cloudera, today, announced their newest addition to their solutions stack, adding an authorization framework to Hadoop, called “Sentry,” which Cloudera says delivers advanced authorization controls to enable multi-user applications for SQL query engines, Apache Hive and Cloudera Impala.
In the relational database world, security has been long figured out, however, as we discussed last week, for the fledgling Hadoop ecosystem, security is still a relatively new and novel thing. Vendors are just starting to come to the table with security options that have been standard fare in the old world. For Cloudera, Sentry puts another piece of the larger security puzzle in place that wasn’t there before.
“Security has been something that has been continually improving since the early days of the company,” explained Charles Zedlewski, VP of Products at Cloudera. “It’s been a big gap that we’ve been filling, and adding the ‘authorization’ piece with Sentry, we think we’ve put down a big down payment on it for a lot of our customers.”
Sentry, which Cloudera says they will submit to the Apache Incubator later this year, provides Hadoop users with the ability for granularly control access to data, enabling Role Based Access Controls (RBAC) – a concept which gained prominence in the relational data world twenty years ago, and is second nature today.
The challenges of providing security for the new age of distributed databases are significant, as Brian Christian, CTO of Zettaset told us last week. With data distributed on several machines across a network, the challenge is to provide the advanced security measures that enterprises expect, without introducing a new single point of failure that can potentially cost you your data. The stakes are high, and several vendors are now coming to the table with their pieces to a long list of security challenges.
With Sentry, Cloudera has chosen to lean on the relational database world in implementing access controls. Sentry stores and maintains a repository of privileges in a standard database, which then governs the access to the data contained within Hadoop.
“Whether it be Oracle or MySQL, you can pick your poison, and leverage all of the underlying security features of that database,” explains Zedlewski, who adds that complex data is not an issue where role base access is concerned. “With data this small, and when the workload is relational, it just makes more sense to use a relational database.”
If using a relational database to implement security for a Hadoop framework which just last month Cloudera hailed as the center of gravity in the datacenter strikes you as odd, it shouldn’t, says Zedlewski.
“There’s actually kind of a long tradition of doing this,” he says, referring to the use of relational databases to implement Hadoop features. “For example, the Hive Metastore, or HCatalog all have metadata stored in a traditional database… Sentry can leverage the same database that’s being used by other parts of the Hadoop stack. You have at least one database hanging off your Hadoop cluster today – I guarantee it – and we can leverage that existing one so that you don’t have to create a net new one just for this.”
Sentry adds just one more piece to a security puzzle, which Cloudera defines into four distinct parts:
- Perimeter: Guarding access to the cluster itself through network security, firewalls and ultimately, authentication to confirm user identities.
- Data: Protecting the data in the cluster from unauthorized visibility through masking and encryption, both at rest and in transit.
- Access: Defining what authenticated users and applications can do with the data in the cluster through file system access control lists and fine-grained authorization.
- Visibility: Reporting on the origins of data and on data usage through centralized auditing and lineage capabilities.
Currently, Cloudera provides perimeter security through Oozie, which offers the ability to create custom authentication. Earlier this year, the company unveiled Cloudera Navigator, which addressed the “visibility” portion of their security framework, providing auditing and disaster recovery technologies. With Sentry, Cloudera tackles the “access” portion. However, Cloudera is still relying on their certified partners for the data encryption and masking portion of the defined framework.
There’s still a lot of work to be done, says Zedlewski, who says that the company will continue to push to make data both more secure, but also convenient to work with once secured.
“One of the things that we’re going to be working on is making the way that people request access for things much more streamlined,” he explains. “It’s one thing to secure things at a certain level, but the question is, how do I grant rights to other people, and how do I do that at scale when you’ve got hundreds of users that want to get onto the cluster. There’s a lot more that we want to do around automation – around the grant and revoke privilege process. There’s also more that we want to do for better support for different forms of integration with identity management systems.”
The Hadoop security beast isn’t something that will be figured out overnight, but piece by piece, the community is responding to the challenge. There will most certainly be more to come as new pieces of the puzzle emerge.