Hadoop Security: Still a Lot of Work to Do
Hadoop is quickly gaining momentum as the foundation for a new class of massively parallel applications that works against petabytes of data. But as customers grow their test clusters beyond a handful of nodes, and with the bad taste of the Target breach still hanging around, the lack of security controls threatens to derail large production Hadoop deployments.
Hadoop comes from open source, and there’s no shortage of open source Apache projects aimed at building security functionality into the core Hadoop stack. A quick list would include Apache Knox for authentication, Apache Falcon for data governance, and Apache Sentry, for role-based authorization. MIT gave us Kerberos for authentication, while Intel gave us Project Rhino, a GitHub-hosted effort to develop a Hadoop framework for encryption, key management, and authorization.
But stitching these components together in a comprehensive and cohesive way is not so easy. Apache Sentry, for example, requires working with XML files, which is not something everybody will feel comfortable with. Many of these open source projects are quite green still and lack polish.
This hole in open source Hadoop security has provided room for third-party vendors to innovate, and has also driven the major distributors to take matters into their own hands. Hortonworks kicked things off in mid-May, when it bought XA Secure, an 18-month old company that provided centralized administration of security policies over multiple Hadoop components. Hortonworks says it plans to release XA Secure’s software into the open source community and to get an Apache project incubated by the second half of the year. It will also incorporate XA Secure capabilities into HDP.
Cloudera followed that up in early June with its acquisition of Gazzang, which developed encryption software for Hadoop. Cloudera says it plans to put the Gazzang development team together with the Intel team behind Project Rhino as part of its “Cloudera Center for Security Excellence.” As part of that initiative, it plans to incorporate Gazzang’s technology to enable “follow the data” authorization and encryption policies within CDH, as well as providing continuous vulnerability assessments to customers, and boosting the security of partner applications through the use of APIs.
These acquisitions caught the eyes of existing players in the Hadoop security space, including Mark Cusack, the principle architect at RainStor, which develops an archiving solution for Hadoop that includes encryption and data masking as built-in features. Many of its customers are large financial services firms and phone companies (like T-Mobile, which we wrote about earlier this week) that require adherence to industry regulations.
“If you look at how Hortonworks and Cloudera are buying up security real estate right now, you can see that they’re stressing this,” Cusack says. “If you look at security problems around Hadoop right now it’s because there’s a large gap, and there a large number of pieces of Hadoop that you need to fit together to be secure.”
Just look at Apache Knox, he says. “That was a tacit admission from the Hadoop community that Hadoop security has failed. Because what they’re saying is all we can do is secure the perimeter and what’s going on behind that perimeter it is fair game,” he says.
Jim Vogt, the CEO of Zettaset, has also been watching the security-related acquisitions with interest. “People are starting to give it attention,” he tells Datanami. “The elephant in the room is that the market is still early. And mainly that’s because a lot of these features are lacking. The biggest one is security.”
Zettaset has been selling its flagship Orchestrator product to the Hadoop community for the past two years. The software not only provides encryption for data at rest and in motion, but it provides high availability protection and helps automate the management and deployment of clusters, which are also crucial concerns. The company plans to support Hadoop 2 and YARN with its software by the end of the month.
Encryption is a hot topic in Hadoop at the moment, especially in regulated industries such as financial services, healthcare, and online retail. Another Hadoop encryption provider to keep your eyes on is Dataguise, which provides encryption not just for Hadoop but a range of enterprise platforms. Yesterday, the company launched a data governance solution that allows companies to declare policies, discover sensitive data, view and track entitlements, and audit access to data across all major platforms, Hadoop included.
The concept of Hadoop as a grand data lake where processing comes to the data and data no longer needs to be moved may be an interesting vision, but it’s not reality at the moment. Therefore, data policies need to be set and enforced in conjunction with other major platforms. “CISOs should not treat big data security in isolation, but require policies that encompass all data silos to avoid security chaos,” Gartner analysts Brian Lowans and Earl Perkins jointly stated.
While Cloudera seeks to build out its own Hadoop security infrastructure and Hortonworks stresses the open source approach, third-party vendors appear to be gaining ground. While IBM offers several encryption and data masking products to its Hadoop users, it also partners with Zaloni for additional functionality.
Zettaset’s Vogt sees Cloudera/Intel/Gazzang as a considerable threat to his growing business (it currently has about 35 to 40 customers). But he doesn’t see open source projects getting close anytime soon.
It would be more convenient for the customer if security was developed in open source, Vogt says. “The problem is that just takes too long,” he says. “We’re about 18 months ahead [of open source] in terms of what we’re shipping.”
Encryption and key management, in particular, are tough nuts to crack when it comes to Hadoop. “We’ve been working on this for two years and there’s quite a bit involved in terms of solving some of the distributed architecture issues around security,” Vogt says. “That’s the secret sauce for us and that’s where we’re actively patenting technology.”
By the time open source catches up, Zettaset will have sped up its Hadoop encryption and delivered finer-grained management, Vogt says. “It’s going to take some time for them to get the amount of integration and scalability that we built natively into the infrastructure,” he says. “They’re just starting to take note that these are really some big issues with customers, and they have to have an answer for security, quite frankly.”