MongoDB Struts Its NoSQL Stuff in NYC
When you think about giants of the technology world, MongoDB may not come to mind. But judging by the big strides this up-and-coming NoSQL database vendor is making, and the aggressive roadmap it put forth today at the third annual MongoDB World conference in New York City–including a Spark connector today and new graph functions tomorrow–they might just become giants.
MongoDB is already a formidable force in the NoSQL database market. Founded as 10gen in 2007, the NYC-based company got an early jump on the big data movement and benefited from slowing momentum in relational technology to build a lead. Today MongoDB has the biggest base of paying customer (purported to be north of 2,000) and has received the most funding: $311.5 million, giving it a valuation just shy of $2 billion.
By comparison, its closest competitors in the general purpose NoSQL space (as opposed to specialized key value stores or graph databases) are DataStax, which has 500 customers; Couchbase, which has received $146 million in funding; and MarkLogic, which also has 500 customers (yes, MarkLogic deserves to be mentioned here). All of them are eager to knock MongoDB off its perch and poach its customer base, which includes half of the Fortune 100, although there is a lot of overlap (Comcast, for example, is claimed as a customer by Couchbase, Datastax, and MongoDB, while Citibank is claimed as a customer by Couchbase, Marklogic, and MongoDB).
But considering that NoSQL is still just a small part of the overall $35 billion database market, that’s not saying a whole lot. While document-based databases such those from MongoDB and Couchbase are often considered for new projects, relational databases still power the long tail of installed applications. MongoDB clearly has bigger fish to fry, and to that end, it’s making strides against a company that truly is a giant in the space: Oracle (NYSE: ORCL).
Oracle still reigns supreme in terms of dollars, founders’ ownership of Hawaiian islands, and mindshare, the latter of which as measured by DB-Engines.com. But MongoDB is gaining ground against Big Red. According to DB-Engines.com, a website that tracks database usage, MongoDB moved up one spot over the past 12 months to become the fourth most popular database in the world, behind MySQL, SQL Server, and Oracle. Cassandra, which is backed by Datastax, sits in at number 7, followed by other examples of NoSQL technology, like Redis (10th), ElasticSearch (11th), Solr (14th), Hbase (15th), Splunk (18th), Neo4j (21st), Memcached (23rd), and Couchbase (24th).
Making life easy for developers is the secret to MongoDB’s success. “The whole raison d’etre of MongoDB is to unleash developer productivity,” MongoDB president and CEO Dev Ittycheria said during his keynote address at MongoDB World Tuesday morning. “We do that by not having you guys worry about designing or managing complex schemas, by not having to map an object in programming language to data sitting in tables, to not have to worry about designing cumbersome queries to get insights into your data.”
Spark Connectors and Cloud
The company made two major announcements Tuesday aimed at making the database easier to use for a certain classes of users, including a new cloud version of its database called MongoDB Atlas and a new Spark connector.
Atlas is a fully managed version of MongoDB that runs in the cloud. Setting up a cluster NoSQL cluster is ridiculously easy, requiring just a few clicks of a mouse in a Web browser. MongoDB handles all the backend management stuff, including backing up the database and installing updates and fixes.
The new cloud service lets customers dynamically tweak settings–such as the size of the database volume, the number of replicated copies, the number of database shards, the geographic zone of the data center, and the speed of the processors–on the fly. This will help to eliminate a headache for administrators who today spend much of their time managing these aspects of databases. The Cloud Atlas service is available on Amazon’s AWS now and will soon be available on Microsoft Azure and Google Compute Engine.
The new Apache Spark connector, meanwhile, will bolster MongoDB’s analytic story. The company previously offered a connection to Spark by way of its Hadoop Connector. But MongoDB decided that to truly make Spark work well for its customers, it needed to build a dedicated Spark connector that could take advantage of existing features in the database.
“We learned for example that most users are working with Spark in Scala so this connector is written in Scala [instead of Java],” explains Kelly Stirman, MongoDB’s vice president of product management. “We learned people really want to take advantage of the underlying features in the database, like secondary indexes and even the aggregation framework.”
Despite it’s less-than-sexy name, the aggregation framework is one of MongoDB’s hidden jewels, particularly when it comes to analytics. By using the aggregate data that the framework surfaces through pipelines of transformations that are compiled into C++ operations that run directly against the database kernel, it eliminates the need for users to move lots of data into a separate Spark cluster.
“You can push as much of the work as possible down into the underlying database where you presumably have big iron, and then you can use Spark to just work on the subset that’s appropriate for that particular analytic job,” Stirman tells Datanami.
MongoDB built the Spark connector with assistance from Databricks, the company behind the open source analytic phenomenon. Databricks gave MongoDB its blessing on the Spark connector, which exposes DataFrames, the preferred method of working with data in Spark version 1.6.
“The new native MongoDB Connector for Apache Spark provides higher performance, greater ease of use, and access to more advanced Apache Spark functionality than any MongoDB connector available today,” Reynold Xin, Databricks co-founder and chief architect, said in a statement.
Machine learning is the big driver for the Spark connector, Stirman says, such as a leaderbarod for a gaming application, a recommendation engine for ecommerce site, or fraud detection engine for credit card processing. “Those are the kinds of things you want to be able to do in real time and not wait hours or days or weeks.”
MongoDB Like Graph
The currently shipping version of MongoDB is version 3.2, which shipped in late 2015. In the fall, MongoDB plans to ship version 3.4, which will bring several promising new analytic capabilities, including recursive lookups and faceted search.
Support for recursive lookups, which is a fancy way of saying graph analytics, will allow MongoDB customers to tackle a certain class of analytic problems in a more natural way, without moving the data out of MongoDB to a specialized database.
“There are pure-play graph databases like Neo that go very, very deep but they also ask you to think about the whole world through the lens of a graph database, which is pretty hard to do,” Stirman says. “We think most people’s graph needs are much more modest. We’ll get you most of the way there with 3.4 and it will be the exception when you actually need a pure graph database to solve a particular application’s requirements.”
If MongoDB intends to head off Neo Technology with recursive lookups, then it’s looking to prevent MongoDB customers from installing search engines like ElasticSearch and Solr with the other major new version 3.4 feature, faceted search.
Faceted search will essentially provide a way for MongoDB to deliver very fine-grained results to users’ queries. “If you think you’ve never used faceted search before I almost guarantee you’re wrong,” MongoDB co-founder and CTO Eliot Horowitz said Tuesday during his keynote. “On every ecommerce site the left nav of the folder section, that is faceted search. You put a search into the search bar and it generates some statistics about that, such as brand and price.”
The future would appear bright for MongoDB and its customers. The company may not yet be as “humongous” as the datasets that its customers store, but if it continues delivering features like native Spark connectors, graph operators, and advanced search functionality, it may get there someday.