Why Databases Are Needed on the Edge
The proliferation of smart devices and the IoT are putting pressure on developers to adopt more sophisticated processes to manage data. Some developers may prefer to continue using simpler file systems to keep track of data, while others want to outsource that responsibility to the cloud. But as the number and sophistication of edge applications increases, the use of databases on the edge will grow, experts say.
The IoT is a big place, and one-size-fits-all solutions obviously won’t work. Not every application that is running on the edge (such as a server in a retail outlet) or running on an actual endpoint (like a sensor on a turbine) needs a full-fledged database underneath it to manage data.
In some cases, when the projected data sizes are small, when the complexity is low, and processing needs are slight, then developers can get by just using a file system, or storing data in memory (if it doesn’t need to be recalled). In some cases, where event data is more important than maintaining state, a message bus like Apache Kafka may be a better choice (assuming the network is stable).
But in other use cases, it will make sense for the application architect to move up to a full-fledged database. That’s true when multiple applications want to access the same data, when data needs to be replicated in real time, or when network connections (for cloud storage) aren’t reliable, among other requirements.
A Natural Aversion to DBs
Let’s face it: Programmers don’t like databases. While developers have grown to love some databases (think MongoDB), the majority of them would rather not deal with the overhead and limitations that databases bring to what they do.
It’s something that Lewis Carr, an IT industry veteran who is the senior director of product marketing at database developer Actian, has experienced many times.
“I’ve gone around the world and heard developers say ‘Oh, databases, yuck! I don’t want to do that!’” Carr says. “But with IoT, there are multiple requirements as to why you need to use something more than file management to support the data management needs you have.”
Zen is the name of Actian’s flagship embedded database. The software, which has been in development for more than 25 years (it was formerly called B-Tree and PSQL), can fit into a 4MB to 7MB footprint, but holds up to 64TB of data. That makes it scalable enough to handle a range of field deployment tasks, from helping to manage incoming data in a smart car to powering an ERP system for schools in Quebec. With support for Raspberry Pis, as well as mobile OSes, it can run practically anywhere.
For Carr, who has helped build LIDAR system for the French Army and helped Japanese auto manufacturers squeeze a database into cars, the decision to adopt a database hinges on the complexity of data processing requirements. When more decision-making takes place on the edge or in the endpoint application, a database is usually the path to get there.
“What you’re going to see over time is these mobile applications are going to start sharing data and speaking with each other at that local point,” Carr tells Datanami. “In that case, I need a database or data management [system] that can talk across them.”
5G Demands DBs
As 5G is rolled out and the network strengthens, developers will be encouraged to build bigger and more sophisticated applications. While some developers will remain averse to using embedded databases – perhaps they will opt for something like an event mesh to organize and orchestrate the data while maintaining state –in the end, it will increase the need for more and better local control of data on the device itself, not shrink it, Carr says.
“All of these things that haven’t happened yet, but will in the next five years because of 5G and because of our increasingly mobile workforce or remote workforce–the application structure is going to change and the data management is going to become something that’s far more important,” he says.
At the high end, applications that are asked to fuse multiple incoming streams of data and perform machine learning and analytics on them – such as with an autonomous car – will absolutely need a database to handle all of these disparate tasks. And if an engineer who is writing those ML algorithms wants to tap into data on the car, quickly and remotely…well, a database is your best choice for pushing that data out.
“If I’m a data scientist and I’m beginning to train an algorithm and tune it, I need a large set of data to do that. Where am I going to get that data?” Carr asks. “Maybe at first, I’ve got some model of data, but later I want to be able to extract that data from the edge. I want to be able to do that, and I need to access those databases and be able to query and pull data from it.”
Databases In the Sky
Gartner certainly is right that the cloud is becoming the default deployment path for databases. Driven by developer demands, database vendors are flocking to offer their wares on the big public clouds, where they can power ever-more-sophisticated Web and mobile apps.
But this isn’t likely to fly out in the real world (i.e. the IoT). Because, no matter how fast the forthcoming 5G network ends up being, there are going to be gaps in the wireless coverage. And if you’ve built an IoT application to rely on a database residing in the cloud to store data, that means the application will stop working when the user moves into a disconnected region, which leads to a horrible experience for that customer.
Another argument in favor of local control over data is the inherent latency of a network connection, argues Mike Bowers, the chief architect at FairCom Corporation.
“While cloud computing is fast and highly available, the network to the cloud is slow and less available,” Bowers writes in a 2019 blog on his company’s website. “Because device data must be collected and transmitted across the network to the cloud, the network slows the collection of data and increases the risk of losing data.”
But there’s an even better reason to run an edge database than lost data or horrible UXs, according to Bowers: the inherent complexity of the data movement we’re about to see.
“An edge database has specific features that make it easy for an application to focus on adding business value rather than implementing mechanical processes for integrating, collecting and aggregating data in the edge, and then delivering resulting data to the cloud,” he writes. “These features include automatic data integration with devices, automatic collection of data, automatic aggregation of data, automatic synchronization of data between databases and systems, and automatic synchronization of data through intermittent and/or slow networks.”
According to a McKinsey study, data has 2.5 times as much value on the edge compared to when its stored in the cloud. The quicker a decision can be made on fresh data captured at the edge, the better that decision is going ot be.
FairCom this week announced that it’s splitting its edge database, which was formerly called c-treeEDGE IoT Database, into two products under the called EDGE V3 name. EDGE IoT will focus on providing fast transaction processing for embedded applications, with support for data replication, REST data services, and the MQT data broker. EDGE IIT Hub, meanwhile, will focus on providing system integrators and operational technology (OT) engineers with a way to integrate multiple streams of data in an industrial Internet of things (IIoT) setting spanning factories, clouds, and on-prem data centers.
These are exciting times for IoT and exciting time for data management. As IoT deployments and data volumes continue to grow, the intersection of those two things points in one clear direction: The need for more sophisticated data management on the edge is increasing, not decreasing. Sorry, edge developers, but prepare yourselves for more databases, not fewer.