Deploying Hadoop on User Namespace Containers
Hadoop is increasingly moving to the cloud, with the Gartner group reporting that over 50% of companies are considering a cloud only or hybrid cloud solution for Big Data. Altiscale has been offering a high-performance, secure, multi-tenant cloud solution since 2014, with its multitenancy and performance capabilities driven by the use of namespaced Docker containers.
In my Thursday session, titled “Deploying Hadoop on user namespace containers,” I will explain my years of work in making Hadoop run effectively in the cloud.
Docker is a very popular container technology. A Docker container provides an isolated virtual machine-like environment. Docker containers are similar to lightweight virtual machines (VMs), but they provide better performance than VMs, resulting in performance levels achieved by bare metal. I’ll explain how to treat a container like a VM or machine and how to expand its capabilities to achieve all that a machine can do.
Docker and Elastic Scaling
At Altiscale, Hadoop is deployed on our data centers in a way that allows customers to process petabytes of data without worrying about Hadoop cluster management. Altiscale clusters grow and shrink elastically to keep pace with the customer’s compute and storage needs.
This elasticity is achieved by growing and shrinking the slave nodes. Docker containers enable the launch of NodeManagers and DataNodes in subseconds in order to respond rapidly to shifting customer demands. Altiscale achieves greater isolation than what Docker provides by applying our user-namespace solution on top of Docker, so that no user inside these Hadoop slaves has root privileges. I’ll describe the Altiscale elastic cluster model, the design decisions behind it, and the issues he encountered and addressed.
Future Development Direction
I’ll also cover future developments in this area that help improve isolation and elasticity, such as nested containers, allowing Hadoop users to launch their own containers. The session is Thursday from 11:00 to 11:40 a.m. in room 230 C. For more information see the session description.
About the author: Abin Shahab is a Senior Software Engineer at Altiscale and a contributor to Hadoop, Docker, and LXC. Prior to joining Altiscale, Abin worked on graph databases and search engines at Guidewire, Symantec, and Vivisimo (IBM). Abin holds a Masters degree in Software Engineering from Carnegie Mellon University and a Bachelors in Computer Science from University of Arizona.