Follow Datanami:
September 9, 2014

MapR Reports Accelerated OpenTSDB Performance

Eyeing new Internet of Things (IoT) applications, MapR Technologies said its open-source distribution of Apache Hadoop “ingested” more than 100 million data points per second.

The performance benchmark for the MapR distribution with its in-Hadoop NoSQL database, MapR-DB, was achieved using only four nodes of a ten-node cluster. By accelerating its OpenTSDB software by a factor of 1,000 on a small cluster, MapR claimed the performance clears the way for managing huge amounts of data along with IoT and other real-time data analysis applications.

Such advances will be needed soon amid soaring data projections over the next decade as IoT sensors begin sweeping up more unstructured data. For example, one recent forecast predicted a ten-fold increase in data volumes by 2020. Moreover, networking specialist Cisco System estimates there will be 50 billion connected devices in the same timeframe.

MapR also projects that the accelerated performance of OpenTSDB also could enable specific data analysis applications like datacenter and industrial monitoring as well as predictive maintenance of distributed hardware systems.

The company said OpenTSDB is primarily used to store and analyze time-series data, that is, a sequence of successive data points. “Originally designed for only datacenter monitoring, poor ingest performance had limited the expansion of its use,” Ted Dunning, MapR’s chief application architect, explained in a statement announcing the accelerated performance. “This benchmark demonstrates a viable option for new applications.”

MapR asserts that time series databases that can take advantage of accelerated performance from tools like OpenTSDB will be needed to store and analyze huge new datasets in real time.

MapR announced the acceleration of OpenTSDB performance by 1,000 times on a four-node cluster during this week’s Tableau Conference in Seattle.

MapR describes OpenTSDB as a scalable time series database built on top of Hadoop and the column-oriented database management system Apache HBase. It is touted as simplifying the process of storing and analyzing large amounts of time-series data from sources like server operations and load metrics as well as sensors measuring, for example, environmental data.

According to MapR, OpenTSDB works by exposing two programming interfaces: Write API, in which servers or sensors send data to the API and OpenTSDB formats the data and stores it in HBase; and Read API, in which users or software access the interface to retrieve time-series data that is aggregated, registered, grouped and graphed as it is retrieved.

Open TSDB is designed to work natively with MapR-DB that in turn implements the HBase API. The company said it has recently enhanced OpenTSDB to improve performance and scalability “by several orders of magnitude.” The enhancements are intended to make it a preferred solution for very large-scale time-series analysis applications.

MapR said an advanced copy of OpenTSDB with enhancements is available at the mapr-demos Github page.

Recent items:

GE Address Times Series Data Demands

MapR Embraces Co-Existence With Hadoop Update