Project Myriad Brings Hadoop Closer to Mesos
One of the challenges of running Hadoop is resource management. The process of spinning up and managing hundreds, if not tens of thousands, of server nodes in a Hadoop cluster—and spinning them down and moving them, etc.–is way too hard to do manually. Automation must come to the table to help Hadoop take the next step forward in its evolution. The big question is how it will unfold.
One answer to that question came to the forefront yesterday when a group of companies led by MapR Technologies and Mesosphere unveiled Myriad, a new project that aims to enable Hadoop jobs running under Apache YARN to be managed using the Mesos resource manager for data centers.
The goal of the Myriad project—which is being hosted on GitHub and counts EBay and Twitter as contributors–is to deliver code that effectively unites Apache YARN and Apache Mesos, the data center operating system (DCOS) and resource manager that was developed at Cal Berkeley’s AMPLab and today is managed by Mesosphere. According to Mesosphere, DCOS is a new kind of operating system that organizes all of ones machines, virtual machines, and cloud instances into a single pool of shared resources. It runs atop Linux and is already used in production by Twitter, Netflix and Airbnb.
With Myriad, any YARN-compatible Hadoop jobs–such as Spark, MapReduce, Pig, or Hive–will be able to run on the same hardware as other non-Hadoop applications. This could be anything from streaming applications like Storm or Kafka, management tools for developers like Jenkins, HPC jobs running under MPI, and regular old Web server workloads.
Myriad brings together both major resource managers for Hadoop and other important apps, says Florian Leibert, CEO and co-founder of Mesosphere. “Big data developers no longer have to choose between YARN and Mesos for managing clusters,” Leibert says in a press release. “Myriad allows you to run both, and to run all of your big data workloads and distributed applications and systems on a single pool of resources.”
Jack Norris, chief marketing officer for MapR, says Myriad delivers the tools that Hadoop users are asking for in the area of cluster management. “One of the motivations with EBay and Twitter is that they have extensive use of Hadoop and they have Web servers that are provisioned for peak and have long periods of low utilization,” he tells Datanami. “The ability to use Myriad to fill those long periods of low utilization with some Hadoop workload and then pull those off in anticipation of peak demand, that provides additional efficiency within the data center.”
Obviously, not every Hadoop user is going to be so heavily invested in data center hardware that they need Myriad to unite operational and analytic workloads on the same cluster. Companies like Twitter and EBay have hundreds of thousands of nodes to manage, if not millions, so even small percentage gains in efficiency translate into lots of dollars saved. A similar dynamic is in effect at HPC sites, where organizations want to start incorporating big data analytic technologies like Hadoop and Spark, but are loathe to dedicate their entire supercomputer to such tasks.
But in the long run, the benefit that comes from provisioning the same hardware in multiple different ways is clearly in the cards for all types of applications, and will help at any scale. The burgeoning diversity of big data applications—from the Hadoop/YARN family and NoSQL databases to in-memory data grids and real-time streaming systems–will benefit from having more fluid and flexible methods of deployment of the sort that Myriad aims to deliver.
“Combining Mesos, which is really good at resource management in general and has pretty good low-level granularity in terms of disk and network and CPU and memory, with YARN, which is really good at managing Hadoop resources but lacks some of that granular control and doesn’t really work across non-Hadoop resources—gives you have this dynamic capability to configure YARN and have virtual Hadoop clusters in the data center,” Norris adds. “And it’s a completely open-source project that will work across different Hadoop distributions. It’s not limited to MapR.”
Concurrently, the folks behind Project Myriad plan to submit it as an Apache Incubator project with the Apache Software Foundation by the end of the first quarter of 2015.