Apache Promotes IoT Database Project
An open source Internet of Things database effort has been elevated as a “top-level project” by the Apache Software Foundation.
Launched as a research project at Tsinghua University and accepted as an incubator effort in 2018, Apache IoTDB targets industrial IoT applications and attendant analytics and data storage requirements.
The IoTDB effort addresses current data management shortfalls exhibited by both relational and key-value store databases as industrial IoT deployments add to the data tsunami. Promoters position the IoT database as “the missing link” between current IoT data and applications.
The goal is “redefining how IoT data is managed, both in the cloud and on the edge,” said Xiangdong Huang, vice president of Apache IoTDB.
The IoT native database leverages a columnar data file format dubbed TsFile billed as a more efficient method of storing and accessing time-series data, thereby improving query performance. The database engine is tuned to time series operations, including aggregating queries and time-alignment queries, the foundation said this week.
“Apache IoTDB easily meets the requirements of storing massive data sets, ingesting high-speed data and analyzing complex data, both on the edge and the cloud,” promoters said.
Industrial IoT deployments are driving the open source effort. “When IoT is used in industrial applications, intelligent equipment usually produces one to two orders of magnitude more data than consumer-oriented IoT devices,” according to a research paperpublished by IoTDB developers. “This makes it even harder for analytics to produce valuable insights in a reasonable amount of time.”
They argued that IoT has spawned new time-series processing workloads tied to edge computing, soaring historical data volumes, the need for efficient data ingestion along with complex, low-latency queries and advanced data analytics.
As the primary file format for time-series data storage in the IoT database, TsFile consists of data pages and “chunks” along with the accompanying index. Each chunk stores time-series data for a particular range. Data is then divided into several pages, which serve as the fundamental unit of disk storage.
Data in the chunks of each time-series is ordered according to time in TsFile, thereby accelerating queries. Those queries with time range filters can then skip chunks outside a selected time window.
The IoT database query engine “takes full advantage of the time-ordered property of TsFiles to reduce I/O and latency for queries with time and value predicates,” the researchers reported.
They used an edge-to-cloud data management applications to demonstrate how the IoT database processes time-series data in real time. The database also demonstrated support for advanced analytics through integration with Hadoop and Apache Spark, the researcher said.