Follow Datanami:
April 2, 2019

MapR to Autoscale Spark and Drill Via Prebuilt Kubernetes Containers

MapR Technologies today announced a technology preview of pre-built containers for Kubernetes that will give customers new capabilities for dynamically scaling their containerized Spark and Drill applications based on demand.

MapR Technologies got into the Kubernetes game last year when it launched the K8S storage driver, which allowed containerized applications running in Kubernetes to access and store data on the MapR data platform. That storage driver, which is today called the container storage interface (CSI), laid the framework for today’s announcement, which propels the Kubernetes-on-MapR story forward into the realm of dynamic resource allocation.

“What we’re announcing today is the next step towards that,” says Suzy Visvanathan, MapR’s senior director of product management. “Kubernetes already offers you the rudiment of scaling the applications automatically….What we are doing is taking that concept from Kubernetes and offering add-on features to it.”

MapR customers are already beginning to work with Kubernetes, the open source orchestration software that Google developed to automate the provisioning of public cloud data center resources to applications based on demand. Armed with the new CSI driver that shipped in early 2019, customers can use Kubernetes and Docker technologies to containerize applications on MapR.

Image courtesy MapR

The company decided to offer clients pre-built containers for Spark and Drill in part because the company ships modified versions of Spark and Drill that have been optimized to run on the MapR platform, Visvanathan tells Datanami. The other reason is the pre-built containers from MapR will be able to dynamically scale to meet demand.

The company plans to ship pre-built containers for Spark and Drill, including small, medium, and large, according to Visvanathan. Each of these corresponds with some basic amount of CPU and RAM. Customers pick the container that best matches what they expect users will need, and Kubernetes and the containers take it from there to adapt to fluctuating demand.

“All the end user has to know is they can use these sizes to create a Spark container that is managed by Kubernetes, and they also will choose the fact that yes I want the auto-scale features,” Visvanathan says. “Then it is not just blindly auto-scaling the containers, but auto-scaling the containers based on the resource utilization that was specified for those Spark jobs.

Without the pre-built containers, customer will have to manually scale their Kubernetes containers, Visvanathan says. That’s not an issue for companies that are just starting to play around with the technology. But it’s a potential deal-breaker for companies that are banking on Kubernetes being an enterprise orchestration layer managing thousands of nodes and petabytes worth of data, running on-premise, in the cloud, on the edge, and in hybrid deployment modes, she says.

“If there’s a customer who envisions running a few hundred containers through the lifecycle of their business, then no, they don’t really need auto-scaling features. In fact they don’t need Kubernetes either. They can manually create containers and just manage it,” Visvanathan says.

“However, we are seeing a lot of customer who are talking about millions of containers,” she continues. “They are talking about tens of millions of Kubernetes pods. So in those cases, we do not advise them to do it manually. In fact we are actively…positioning Kubernetes for them and actively then tell them ‘Take on our add-ons, with our Spark and our Drill that we are giving.'”

All of the major cloud providers have standardized on Kubernetes for orchestration of containers, which makes Kubernetes both a crucial link and a key enabling technology in support of an organization’s hybrid deployment strategy for big data. MapR, which runs in Amazon, Azure, and Alibaba cloud (with Google Cloud coming soon), is pursuing a hybrid cloud strategy, along with many other big data platform companies.

MapR plans to ship the pre-built containers for Spark, Drill, and the Hive metastore by the end of the second quarter. At that time, MapR may also have something to share in terms of a management console that lets customers control hybrid scenarios that involve containers moving from on-prem to cloud environments, Visvanathan says.

“As 2019 progresses and 2020 comes around, you will see that these kind of deployments are going to be become more normal,” she says. “And I actually envision millions of containers to be running in production applications soon.”

Related Items:

Inside MapR’s Support for Kubernetes

Kubernetes Is a Prime Catalyst in AI and Big Data’s Evolution

Google Brings Kubernetes Operator for Spark to GCP

Datanami