Avoiding Split Brainedness in HA Hadoop Clusters
The US Patent Office recently granted Zettaset a patent for the underlying technology in its Hadoop high availability that prevents a “split-brain” situation where multiple master nodes think they’re in control of the Hadoop cluster. It’s a feather in the cap for Zettaset, which has also innovated in the area of Hadoop security.
The single point of failure of the Hadoop NameNode and the overall lack of native high availability (HA) features in Hadoop should not be a surprise to you. In fact, it’s been a well-documented issue for years, and one that Apache Hadoop community has worked to fill by bolstering Hadoop’s availability and resiliency.
Much progress was made with the launch of Hadoop version 2 last year, which brought automated mechanisms for handling a NameNode failover and maintaining continuous access to HDFS services. Whereas first-gen Hadoop deployment were vulnerable to losing data, Hadoop version 2 promises full protection for the entire Hadoop stack, including MapReduce, Hive, Pig, HBase, and Oozie, according to Hortonworks.
While the Hadoop community has made definite progress, there is still room in the market for vendors like Zettaset to innovate. Zettaset’s flagship offering, called Orchestrator, delivers a management layer over Hadoop with the aim of bolstering not only high availability, but security and monitoring too. In the HA realm, Zettaset aims to provide automated protection against downtime without requiring lots of manual intervention.
The Mountain View, California company says Orchestrator implements an HA failover mechanism that protects not only the Name Node, but the Job Tracker, Oozie, Kerberos, Hive, and the meta data store layers as well, which it argues are not well protected by plan vanilla Hadoop distributions. Upon detecting a failure, the software automatically fails over to the backup, which is kept up-to-day via data synchronization. More than one backup can be designated in a “1-to-n” cascading failover setup for each protected service.
Creating fault-tolerance in computer clusters is nothing new. But in U.S. patent number 8,595,546, Zettaset explains how it went about enabling a failover mechanism for Hadoop that avoids “split-brain” syndrome by leveraging what it calls “quorum-based majority voting strategies with time-limited leases.”
|A diagram of Zettaset’s split-brain resistant invention for Hadoop HA|
The key to avoiding the “split-brain” syndrome, where multiple master nodes think they’re in charge of the cluster, is a “time skew” on the order of perhaps 10 or more seconds between the current master node and the other master candidates, according to Zettaset’s patent, which was granted by the US Patent and Trademark office on November 26, 2013.
In the event of a failure, the “new master is issued a new time-limited lease, and after waiting for a period of time no less than the maximum clock skew, start the master service,” the patent reads, “This method effective prevents split brain situations between master candidates…”
It’s all about enabling better HA for Hadoop, says Zettaset president and CEO Jim Vogt. “Our newly-patented, high-availability technology represents a major breakthrough for enterprise organizations that require and expect rock-solid dependability from their Hadoop clusters, and complements the existing open-source ecosystem,” Vogt says. “This unique capability from Zettaset, along with the most comprehensive security available for Hadoop environments, makes Hadoop truly compelling to the enterprise.”
Zettaset is not the only third-party software vendors trying to improve upon Hadoop’s HA. Indeed, all of the Hadoop distributors–from pure-play vendors like Hortonworks, Cloudera, and MapR Technologies to big-name companies like IBM, Pivotal, and Intel–are addressing the problem, in one way or another. But with IP like this behind it, Zettaset certainly appears to be helping push the state of the art, and that’s good for the Hadoop ecosystem as a whole.