September 29, 2015

MapR Gives Hadoop Distro More NoSQL Smarts

Alex Woodie

Hadoop and NoSQL databases share some similarities, but the platforms typically live within different levels of the big data spectrum. Now MapR Technologies is working to break those barriers down by adding a major piece of NoSQL functionality to its Hadoop distribution.

At the Strata + Hadoop World conference today, MapR Technologies announced that it’s now supporting the storage of JSON documents in MapR-DB, the enterprise NoSQL data store that ships with its Hadoop distribution. As a spruced up version of the HBase key-value store, the MapR-DB has always provided some NoSQL functionality, particularly for serving large amounts of randomly accessed data, which is not something HDFS is particularly good at doing.

But the addition of a JSON document store gives MapR’s distribution of Hadoop the same basic functionality as you’d find in commercial document-oriented NoSQL databases, such as those from MongoDB or Couchbase. (The MapR-DB already had wide-column NoSQL functionality, which is most visibly implemented in the Apache Cassandra NoSQL database.)

“What we’re announcing is the first in-Hadoop document database,” says MapR chief marketing officer Jack Norris. “It has quite a number of benefits. You’re going to see a number of developers using JSON today jump at a scalable JSON solution.”

Norris identified several ways they could gain competitive advantages by adopting the native JSON data store in its Hadoop distribution. For starters, it scales better than the document-oriented data stores.

“It picks up where some of the tools today have well-understood scalability issues,” he says. “You can take Mongo output and put it into MapR quite easily. The issue is…running Mongo on Hadoop doesn’t do anything to change the scale issue of Mongo. You still have the issue of, how does it effectively scale when the processing exceeds memory, and so forth. So there are some issues that don’t go away just by running Mongo on the same nodes as Hadoop.”

Norris says the new JSON data store will also help to “collapse the cycle” of insight by removing barriers between the two types of storage and processing mechanisms.

It’s common for customers to use both Hadoop and NoSQL databases and to move data back and forth between them. While Hadoop excels at finding insights hidden in data, those insights are useless unless they’re implemented in some way. In many cases, the business insights are often implemented via NoSQL databases, which are the scalable backend systems of record for modern Web and mobile apps.

“Instead of taking that data [from operational systems] and shuffling it off to another platform and other solutions that focus more on the analytic aspects, you’re able to do that in the same cluster,” Norris says. “It’s about providing the flexibility and agility to organizations as they leverage things and being to do the analyses in place without having” to move the data to different platforms.

Having a JSON data store integrated so closely into Hadoop will also make life easier for developers, he says. “We’ve seen a lot of excitement from developers,” Norris says. “They say ‘If you have native JSON support then I don’t have to serialize things–I can leverage this directly and it just saves a lot of time and moving parts.’ We think this is going to be very impactful.”

MapR has already offered some support for JSON, most notably as a data format used by Apache Drill, the SQL query engine that it sponsors. But the new support for JSON inside MapR-DB is a whole different ballgame.

“If you look at what’s going on in organizations, JSON as an output is really exploding,” Norris says. “It’s become the defacto data interchange format for Web apps. It’s an output of many IoT machine generated sources. It’s definitely one of the fastest growing data sources, and the ability to leverage that directly, without requiring IT processing as a precursor–that’s significant.”

MapR sees the JSON data store being implemented in a number of different industries. In the Web and retail worlds, it could help integrate analytics with operational systems, such as by serving real-time insights or highly personalized and targeted ads to Web and mobile customers. It could also be used to help detect of fraudulent transactions in real-time at the time of purchase, thereby putting a stop to fraud.

MapR did a survey recently that found that 18 percent of users had 50 or more applications running on a single Hadoop clusters. “One of the big advantages of Hadoop is to reduce the sprawl of data silos,” Norris says. “Then you look at some of the environments where you have separate Hadoop and separate NoSQL or HBase clusters, and the batch extract and loading between them. You’re kind of replicating some of the application silos that existed previously.”

Adding a JSON document to the Hadoop cluster could help to tamp down the data-sprawl problem. “That really drives agility and lowers total overall cost and makes data more impactful in an organization,” he says.

The JSON data store supports the Open JSON Application Interface (OJAITM), a general purpose JSON access layer. MapR’s implementation features bindings for Java, Python, and Node.js. MapR shipped a developer preview of the JSON store last week that contains the bindings, the API, and sample code.

When it’s ready for GA (general availability), which is expected by the end of the year, MapR’s JSON data store will be available in all editions of its Hadoop distribution, including the free community edition. Customers buying the enterprise version will get advanced features, such as support for replicating JSON data among clusters running in different data centers.

MapR Delivers Bi-Directional Replication with Distro Refresh

Applications: Data Mining, Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Retail

Vendors: Couchbase, MapR, MongoDB

Tags: big data, cassandra, Couchbase, document-store, Hadoop, json, mapr, MapR-DB, Mongo, NoSQL

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

MapR Gives Hadoop Distro More NoSQL Smarts

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

MapR Gives Hadoop Distro More NoSQL Smarts

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link