MongoDB Automates Resharding, Adds Time-Series Support
Resharding a distributed database is among the most time-consuming tasks that DBAs are asked to perform. But that pain may be a thing of the past, at least with MongoDB’s database, which brings a new automated resharding capability in version 5, along with support for time-series data and other new data features.
The new automated resharding functionality will allow customers to speed up their databases without the hassle and downtime associated with manually resharding their data. Garaudy Etienne, MongoDB’s product manager for sharding, discussed the importance of the new feature during a presentation at today’s MongoDB.live event,
“Let’s take a database where you collect orders, and you want to evenly distribute your orders,” Etienne says. “You decide to pick the randomly generated order ID at the shard key. Orders are then distributed across all the shards by order ID.”
However, as you begin to run the database workload with that shard key, Etienne says, you start to realize that it was a mistake to distribute the data that way.
“Since the customer will have more than one order…it’s likely that the query will need to access multiple shards,” he says. “For performance reasons, you want to avoid unnecessary multi-shard queries. Looking back, it would have been better to have the orders of each customer in a single shard.”
In this example, it would have been better to shard the data based on a combination of the customer ID and the order ID.
“Circumstances and business requirements have changed, and now you need the way the data is distributed across shards to adapt to these changes,” he explains. “In the past this would mean you would need to manually change the shard key. You would either dump and reload your data with a new shard key, which probably means downtime for you application. Or you would write your own migration scripts, which would move data in the background to a new cluster while your application in means up.”
Both of these are complex and expensive processes that can take days or even weeks to complete, Etienne says. “But with MongoDB 5.0, we’re introducing live reshardingk,” he says. “It does all the work for you. It’s fully automated and it runs in the background. So you can re-shard your collection without any downtime or manual operations.”
This release also introduces support for time-series data collections, which the company says will be beneficial for certain types of applications where time is a critical factor.
“This is a game-changer for everyone who develops and runs applications that either produce or need to process massive amounts of time-series data,” says Mark Porter, MongoDB’s CTO, in a presentation at MongoDB.live today.
“For example in IoT, in financial analytics, or in operational investigations, with our new time-series collections, you don’t have to use a separate data store any more for your time-series data. It’s all in MongoDB.”
One early adopter of time-series data is Bosch IoT Insights, which develops a cloud-based offering for analyzing IoT data. Before adopting the new database, Bosch relied on a custom solution with its own bespoke data model for time-series data, which added complexity and friction for developer and customers alike, says Erwin Segerer, a software developer with Bosch.IO.
“MongoDB 5.0 and its time series collections radically simplifies our technology stack and improves user experience,” Segerer says in a press release. “IoT data is automatically stored in a highly optimized format that reduces storage consumption while also enabling fast and efficient queries and analytics against the data. As a result, users unlock insights faster–no matter if it is time series or non time-series data–all while working with a single intuitive and powerful query API.”
Another new feature in MongoDB 5.0 is a versioned API. According to Porter, this capability will ensure that the API stays the same, even as the application or the database changes underneath it.
“Starting with MongoDB 5.0, you can tag your application to the database API functionality you originally built your application against, and we will keep that API available, functioning exactly the same for many years,” Porter says in the video. “So you don’t have to upgrade your apps on a schedule your IT department sets for your database. Instead you upgrade on schedule is right for you and right for your application.”
This release also brings support for client-side field level encryption in multi-cloud environments, which will bolster privacy, MongoDB says. This function will work with Atlas, which is MongoDB’s hosted database service.
MongoDB is also rolling out a preview of a new serverless offering for Atlas, which will improve the way developers interact with the hosted offering. MongoDB is also bringing function scoring to Atlas Search, which will let users apply mathematical formals to fields within documents to improve their relevance.
MongoDB.live continues this week at https://www.mongodb.com/live.