MongoDB Pivots to Analytics
MongoDB wasn’t originally designed to power advanced analytics. But the frustration that people have with Hadoop data lakes and the desire to get real-time answers from the database driving their applications is leading the company to build more analytic capabilities directly in the NoSQL database.
Analytics was a major theme this week at MongoDB World in Chicago, where top executives discussed the upcoming analytic features that will be added to the database. While transactional capabilities have dominated up to this point, the growing chorus for analytics is becoming hard for MongoDB to ignore.
“It’s a function of customer demand,” says MongoDB CEO Dev Ittycheria. “What we’re seeing is a lot of these investments in data lakes have not really yielded the success that they thought [they would get]. The data lake was supposed to be panacea to these analytics problems, and they’re realizing it’s not.”
The problem with data lakes is threefold. “One, they’re very complex. You need a bunch of data scientists on your staff,” Ittycheria says. “And two they’re expensive. Three the performance of these lakes are not great. So it takes a long time to get answers back.”
The lack of return on investments in Hadoop data lakes is leading customers to seek other solutions. “People are saying we’ve already thrown a lot of money at the problem,” Ittycheria tells Datanami. “We’re getting a lot of customers saying I want to operationalize my data lake by putting a lot of the analytics on MongoDB because [that’s where] you have the latest and more accurate data sitting.”
That’s not to say that the company is going to stop working on OLTP features in support of OLAP features. According to Ittycheria, the company is still focused on OLTP. “But I should be careful,” he added. “There’s definitely a merging in this space, so real time analytics is a thrust that we’re seeing a lot customers use MognoDB for. But if you’re doing some esoteric analytics and doing some esoteric queries, you’re not going to use MongoDB for that. You’re going to use Cloudera or MapR or Hortonworks.”
The company already has some analytic capabilities. For starters it supports Apache Spark, the uber popular data science framework that’s the darling of data scientists, engineers, and analysts everywhere. It also offers a SQL-based BI Connector that lets users explore their MongoDB data through business intelligence tools from Tableau, Qlik, w, and others. And it also offers a graph query engine that enables users to explore connections among different entities stored digitally in the database, enabled through MongoDB’s pluggable architecture.
But the database will be getting more analytic features in the future to support customer demand. “More and more people are doing real-time analytics directly out of the database,” says Eliot Horowitz, MongoDB co-founder and CTO.
This week at MongoDB World, the company unveiled the upcoming Charts capability in version 3.6, which is due out this fall. Charts will allow developers to explore their data directly from a Web-based interface connected to the document store, as opposed to viewing it through the potentially distorting SQL lens of a BI tool.
“It comes up quite a bit from our users, where they want something that feels like a document,” Horowitz tells Datanami. “Developers were getting frustrated that they couldn’t look at their data easily. They can use Tableau. That works. But especially for developers it doesn’t work that well because they’re losing all the richness of the data they’re working with.”
Losing the richness of the document model can be infuriating for a developer, Horowitz says. “You’re looking at the document. You’re dealing with this every day. But you can’t analyze the document,” he says. “You’ve got to analyze this other thing that kind of looks like it, but not quite. It’s a mess.”
Horowitz didn’t think the company would have to build this. “I thought somebody would do it by now,” he says. “I thought somebody would build a really great native BI tool. But since it doesn’t exit and the market has been demanding it, we decided to do it ourselves.”
While MongoDB doesn’t have plans to power heavy-duty machine learning or deep learning workloads, it’s often the repository for the data that’s crunched using those algorithms. “We are definitely moving more and more into the analytics space,” Horowitz says. “We have a pretty rich roadmap to improve the large-scale analytics that people are doing.”
Besides Charts, there are several new features up the upcoming release of MongoDB 3.6 aimed at analytic use cases, including a new expressive lookup feature that lets users do more kinds of joins for subqueries, and a new pipeline builder that gives user a visual way to build aggregation pipelines. Many of these features will build on Stitch, the new backend as a service unveiled this week that’s aimed at simplifying busywork for developers.
Horowitz shared some other plans. “Longer term, we’re looking at things like making a parallelized query execution engine and a column store,” he says.
The column store will speed up analytics by storing the data in a more efficient format. Most of the major analytical relational databases, such as HPE‘s Vertica and Teradata, have column stores underlying their data model.
However, the column store will take some time. When asked how long, Horowitz said it would be a year or two before it comes out.
In the meantime, there are other features in MongoDB that lend themselves to analytics, including the capability to isolate their analytics and transactional workloads.
“In Mongo you can have a cluster and mark certain nodes as analytics only, so that you’re production traffic never hits them and only analytics traffic goes there’s so you can have really good isolation and making sure your bad Tableau query doesn’t bring down your production database,” he says. “In Mongo that’s pretty easy. In other database, that’s not true.”