MapR Delivers Bi-Directional Replication with Distro Refresh
A new release of the MapR Distribution including Hadoop unveiled today will enable companies to perform real-time, bi-directional data replication between Hadoop clusters that are thousands of miles apart. The new table replication feature was added to MapR-DB, the NoSQL database included with the high-end edition of MapR’s commercial Hadoop offering.
As Hadoop adoption grows, companies are finding it increasingly difficult to ensure that they’re acting on the latest, freshest data. This fast-data problem is particularly evident in organizations that run multiple Hadoop clusters and need to ensure that the analytic apps running on clusters have the best data available to them.
MapR says it has addressed this challenge with version 4.1 of its MapR Distribution, unveiled today at the Strata + Hadoop World conference in MapR’s hometown of San Jose, California. The new table replication feature in the MapR-DB NoSQL database delivers real-time data replication, thereby enabling customers to maintain multiple, active replica clusters in multiple data centers around the world.
“This is a huge step,” says MapR’s chief marketing officer Jack Norris. “The big focus here is organizations moving to impact business as it happens, which is combining operational data with in-line analytics…It’s about understanding what’s going on at the moment and then optimizing for revenue or cost efficiency or risk mitigation. This real-time capability [will enable organizations] to have a distributed, globally consistent footprint.”
While might sound nice for an organization to have a single Hadoop cluster to store all of its data, that’s not necessarily how things evolve in the real world. Large multi-national organizations commonly spin up multiple Hadoop clusters to serve specific departments, geographies, or lines of busiess.
MapR’s new table replication facility allows those clusters to be configured in an “active-active” manner, thereby enabling operational data to be stored and processed close to the users or devices they serve, while allowing live data to be streamed to other clusters in real time.
Norris says the replication facility is a result of some of the earlier investments MapR put into its distribution, including the random read/write file system and the integrated NoSQL database. Customers can use the replication feature in any configuration of data centers, including having one centralized Hadoop cluster used for analytics or replicating data across a daisy chain of data centers.
“If the data centers are close enough, you can do synchronous replication,” he tells Datanami. “If they’re geographically distributed, it’s asynchronous with transactional integrity, so if the connection goes down, [the transaction updates] will be applied in the correct order and resumed when you reestablish the connection. It’s a high-end, utility-grade solution.”
Version 4.1 also brings a new POSIX client for loading data into Hadoop. Some customers are well served by using the standard HDFS API, while MapR’s standard NFS-based client provides additional flexibility for mounting data. The new POSIX client provides a third option. MapR says the bolt-in compression, parallelism, and encryption features will be particularly attractive for organizations that need to meet stringent SLAs on ingesting data into the Hadoop cluster.
Lastly, MapR has added a new C API for MapR-DB, thereby enabling developers who program in C the ability to develop for its Hadoop cluster.
In a separate announcement, MapR unveiled three Quick Start Solutions for its Hadoop distro. The new solutions will deliver pre-configured application templates for three common use cases, including Data Warehouse Optimization and Analytics, Security Log Analytics and Recommendation Engines. The Quick Start Solutions start at $30,000.