October 24, 2012

MapR Traces New Routes for HBase

Nicole Hemsoth

It’s showdown mode for the Hadoop distro vendors who have gathered in New York City for this year’s Strata Conference and Hadoop World 2012 event. Among that list of companies vying to spin up the best platform for a growing community of Hadoop users is MapR, which hosted an event just a few blocks from the main action today. The event, hosted in conjunction with Google, was centered around their newest M7 platform refresh.

Each of the vendors has tackled a specific component of the Hadoop puzzle in an effort to appeal to the widest base of users, making the task of comparing the various Hadoop distributions even trickier.

Of course, if you ask MapR, there’s no trick about it—they leaped at the opportunity to illustrate what separates their enterprise-grade services from the others. Among other things, they claim their distinguishing feature is their NFS access ability, a clever workaround for fault tolerance and reliability, and some enhancements to boost performance and scalability.

At the root of some of their recent work for M7 on the speed, scale and access fronts is HBase, which was where they put the bulk of their development efforts in advance of today’s announcement of their boosted platform. The last enterprise distribution, M5, addressed reliability and access shortfalls, but the newest release targets performance more distinctly.

According to MapR, HBase is a natural target area since it’s becoming a stable part of many production environments. The company says that around 40% of Hadoop users opt for HBase as their touchstone non-relational distributed database, in part because of the simple fact that it runs natively on top of HDFS in a Hadoop cluster environment. It’s not that there aren’t other NoSQL options out there, however, said MapR’s Jack Norris.

Despite its longevity and decent adoption level, HBase still has the reputation of an immature database approach, but MapR thinks they’ve pinned down a few tricks on the recovery side, including providing robust snapshot and mirroring capabilities. MapR’s lead software architect demonstrated the mirroring and snapshot functionality for us during the event today, pointing to the relative speed and ease with which one could immediately pull up snapshots and start rolling ahead with the application again. On top of that, the company claims that even with concurrent, repeated hardware or software outages, applications will keep running without admin alarm bells demanding immediate attention.

The tweaks are not just about reliability and failover; performance optimization is the key to this release. HBase tends to go through several processes that generate a lot of I/O overhead, so the team tried to eliminate these mini-bottlenecks, with some significant performance increases over their last (M5) releases. MapR says that among other approaches, they’ve managed to eliminate the need for compactions, which means M7 can address uniform and consistent performance needs. Additionally, by utilizing innovative data structures that minimize the read- and write-amplification factor, inserts and updates are much faster. In addition, they say that since M7 also supports in-memory columns users have more options to increase database performance.

“If you look at HBase in the context of the loads of other NoSQL databases that are out there, we think we have an advantage in terms of offering better scalability, especially when you look at MongoDB or Cassandra, for example,” Norris told us today. He pointed to the scalability tweaks in M7 that he claims allow users to handle more than a trillion tables. This scalability is enhanced by the addition of more column families and expanded row and cell sizes.

Of course, none of this is useful without the ability to manage effectively. The company says that M7 greatly simplifies HBase administration by ensuring there are no separate processes to monitor and manage, no manual compactions, no manual region merges, no pre-splitting, no manual database repair operations and no downtime for standard maintenance.

Some have stated that the problem with MapR’s approach is that it creates a big data “lock-in” situation via the proprietary replacement of HDFS within its distro. The company was careful to skirt this criticism, noting that the NFS access capabilities actually provide a more open environment than users can get with the other distros. Further, the company says that when large-scale customers who are looking for a highly reliable platform evaluate their technology they are concerned with the best solution for the job. In other words, what they seem to be suggesting here is that the lock-in creates a tricky situation, but if the performance, reliability and access are solid enough for what that user is trying to accomplish, it’s a price they’ll pay.

Despite the clear competition in the ever-growing ecosystem around Hadoop, Norris said that the company’s future, especially as organizations look to put Hadoop into production across an ever-increasing mission-critical, is looking bright. Additionally, he pulled some numbers from job giant Indeed.com showing the steep climb in the numbers of jobs requiring some familiarity with Hadoop as a sign of growth, noting that this also means that organizations looking at Hadoop want developers who can do more with less time and effort—and this their sweet spot in Norris’ view.

The main drivers for Hadoop adoption overall are strong, said Norris, pointing to the reduced cost of storage as one of the most basic. “If you look at the costs of storage alone, companies are being given all the incentive they need to keep all their data.” Beyond that, the fact that users are no longer painted in the corners of their own purpose-built models and questions is another driver. Instead of being tasked with creating certain questions, using Hadoop means that it’s not necessary to know what questions you want to ask of your data beforehand.

While MapR is seeing a clear uptick in interest around Hadoop, they are confident that users who are evaluating the range of solutions around the platform are going to opt in favor of reliability and scalability at this point. These are two features the company targeted with this update—but the performance piece is where it will get really interesting across the entire ecosystem with next year’s presumed round of distro upgrades.

Related Articles

MapR Floating Google Cloud

Greenplum, Kaggle Team Up to Prospect Data Scientists

Six Super-Scale Hadoop Deployments

Applications: Data Mining

Technologies: Frameworks

Sectors: Other

Vendors: MapR

Tags: google, Hadoop World, HBase, mapr, strata12

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

MapR Traces New Routes for HBase

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

MapR Traces New Routes for HBase

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link