The Data of Things: How Edge Analytics and IoT Go Hand In Hand
In the next five years, 15 to 40 billion additional connected devices are expected to hit the market. By some estimates, it will be a 285 percent increase from the number of connected devices currently available. With a magnified number of connected devices anticipated, a flood of data deriving from “things,” whether a smart video camera or a smart traffic light, will likely raise a new set of challenges for enterprises.
Those that understand that IoT data’s real value will come from analysis will reap the benefits and drive business value – such as predictive maintenance, additional efficiencies, and even new services for customers. As enterprises take IoT head on, understanding what purpose and role data serves will be crucial, for the real value of IoT will lie in analyzing streams of information to make better, and faster, business decisions.
Big IoT, Big Data
For many people big data is, erroneously, synonymous with the Hadoop framework. But Hadoop does not have the ability to deal with real-time, streaming data, as is the case with IoT data. While IoT data has similar characteristics as big data, IoT data is much more complex. IoT data is:
- Messy, noisy, and sometimes intermittent because sensors are often deployed in the field. IoT data is ultimately collected by sensors sitting somewhere – for example, a sensor could be deployed on a telephone pole or street light. Sensors often cut in and out.
- Often highly unstructured, and sourced from a variety of sensors (fixed and mobile)
- Dynamic – “data in motion” as opposed to the traditional “data at rest”
- Sometimes indirect – we cannot measure a certain relevant quantity directly, for example, using a video camera with video analytics to count people in a certain area
The notion that collecting information from sensors and bringing it into one central computing station is not a long term scalable solution, particularly as
the volume of IoT devices and data is forecasted to explode. Bringing such a large amount of data to a relatively small number of data centers where it is then analyzed in the cloud, simply not scale. It will also be costly because transporting bits from here to there actually costs money.
With so many devices producing so much data, a correspondingly large array of analytics, compute, storage and networking power and infrastructure is essential. Though analytics will be necessary to the growth and business value of IoT, the traditional approach to analytics won’t be the right fit.
To The Edge!
A clear solution that addresses scale and efficiency has arisen–distribute the analytics to the edge, or very close to it. Enterprises must harness the smartness of the myriad of smart devices and their low cost computational power to allow them to run valuable analytics on the device itself. Multiple devices are usually connected to a local gateway where potentially more compute power is available (like Cisco’s IOx), enabling more complex multi-device analytics close to the edge.
How does distributed IoT analytics work? The hierarchy begins with “simple” analytics on the smart device itself, more complex multi-device analytics on the IoT gateways, and finally the heavy lifting — the big data analytics — running in the cloud. This distribution of analytics offloads the network and the data centers by creating a model that scales. Distributing the analytics to the edge is the only way to progress.
Edge IoT analytics is more than just about operational efficiencies and scalability. Many business processes do not require “heavy duty” analytics and therefore the data collected, processed and analyzed on or near the edge can drive automated decisions. For example, a local valve can be turned off when a leak is detected.
If latency is a concern for some businesses, then actions can be taken in real time to avoid delays between the sensor-registered event and the reaction to that event. This is extremely true of industrial control systems when sometimes there is no time to transmit the data to a remote cloud. Issues such as this can be remedied with a distributed analytics model.
In some sense, cities are already dealing with these types of issues. Sensors such as CCTV units and speed cameras are already deployed on our highway infrastructure, and as these become smarter, the volume of data that can be collected increases and becomes more valuable to governments and city councils looking to make their transport systems more efficient.
Even when compressed, a typical video camera can produce a few megabits of data every second. Transporting these video streams requires bandwidth. Not only does bandwidth cost money but if in addition you want some quality of service, the whole thing becomes even more expensive. Thus, performing video analytics or even storage on the edge and transporting only the “results” is much cheaper.
For example, a system that takes in data from thousands of sensors needs to operate on a more instantaneous basis to reflect a changing highway. In this case, the machine is trained on what represents normal traffic flow and once the learning phase is over, the machine can autonomously indicate that something abnormal has happened. This is a more efficient method of detection because it is easier to spot something abnormal once you have learnt what is normal – after all, this is the way the human brain works.
The Machines Are Learning
The term “machine” is used here to mean the computer that ingests the video stream and runs the anomaly detection algorithm. This, in principle, could be run in the camera itself, or very near to it. As you deploy smarter sensors to watch over traffic, it makes sense to do the analytics in the devices themselves rather than sending the data back for central analysis in a system, which can be inefficient and risks bottlenecking. A system like this will make smart city management as responsive as it needs to be in the rapid environment of the highway, where a split-second decision could cause or prevent a crash.
However, some trade-off must be considered with edge analytics. Edge analytics is all about processing and analyzing subsets of all the data collected and then only transmitting the results. So, we are essentially discarding some of the raw data and potentially missing some insights. The question is, Can we live with this “loss” and if so how should we choose which pieces we are willing to “discard” and which need to be kept and analyzed?
The answer is not simple, and is determined by the application. Some organizations may never be willing to lose any data, but the vast majority can accept that not everything can be analyzed. This is where we will have to learn by experience as organizations begin to get involved in this new field of IoT analytics and review the results.
It’s also important to learn the lessons of past distributed systems. For example, when many devices are analyzing and acting on the edge, it may be important to have somewhere a single “up-to-date view,” which in turn, may impose various constraints. The fact that many of the edge devices are also mobile complicates the situation even more.
If you believe that the IoT will expand and become as ubiquitous as predicted, then distributing the analytics and intelligence is inevitable and desirable. It will help us in dealing with big data and releasing bottlenecks in networks and data centers. However, it will require new tools when developing analytic-rich IoT applications.