Follow Datanami:
October 5, 2015

Big Payoff for Big Data: IoT in Manufacturing

The term IoT (Internet of Things) has become a popular topic in the data analytics world. Nowadays almost everything can be equipped with a sensor, a processor, and a network connection, collecting and broadcasting ever more data into the world. While much of the discussion about IoT centers around retail-based smart systems (i.e., thermostats, alarms, household appliances, cars, etc.), one of the areas where IoT may have the biggest impact is manufacturing.

A recent Intel Blueprint entitled “Internet of Things (IoT) Delivers Business Value to Manufacturing”1 demonstrates how Big Data analytics can be used to optimize manufacturing processes that result in improved quality, increased throughput, better insights into shop floor issues, and reduced downtime.

The advantage of big data analytics over more conventional relational (RDBMS) can be found in the ability to manage and analyze unrelated data sources from disparate locations. For example, many different types of data are available from a typical production process. The data can be broken down into three major categories:

  • Real-time Semi-Structured Data
    • Machine builder standards such as SECS/ GEM, EDA or custom- based on COM, XML
    • Sensors (vibration, pressure, valve, and acoustics), Relays RFID
    • Data from PLCs, motor and drives
    • Direct from motion controllers, robotic arm
    • Manufacturing historians
(time series data structures)
  • Unstructured Data
    • Operator shift reports
    • Machine logs, error logs
    • Text messages, vision images, audio/video streams
    • Manufacturing collaboration social platforms
    • Maintenance logs
  • Structured Data
    • RDBMS, databases, NoSQL
    • Enterprise data warehousing
    • Spreadsheets

Big data analytics makes it possible to use this data to enhance and optimize the production process. In addition to the variety of data, the volume of data can also be large. An example offered by Intel indicates that in a single week data created by machine events and parameters, error logs, and defect images from vision equipment can approach 1TByte in size.

In this particular instance, the Intel team started with existing machine performance and monitoring data, and then proceeded to use big data analytics and modeling to ingest additional data used to predict potential excursions and failures. The system provided a way to predict machine component failures and allowed operators, engineers, and managers to act in advance and realize savings from improved yield, reduced repair time, and using fewer spares.

Tools of the Big Data Trade

Using the right tools for big data projects is important. The most well-known open source analytics tool is Apache Hadoop — although in reality, the Hadoop ecosystem is actually a collection of tools and frameworks that are used to build big data applications.

As described in the Intel Blueprint, data are collected through a variety of means and stored in a Hadoop HDFS file system. From there, the data can be analyzed using a variety of tools.

In terms of shop floor analytics, one of the most important metrics is the speed at which data can be analyzed and insights can be realized. In a traditional Hadoop batch environment, job turnaround time is often not nearly as critical as it is in a production environment. Two tools that have shown great utility in ease of use as well as rapid response are the R language and Apache Spark. R has a long history as a high-level, easy-to-use data analysis and statistics tool. Spark, on the other hand, is somewhat new to the analytics market, but offers a form of in-memory computing that provides dramatic speed increases for analytic jobs.

Big Memory for Big Data: Numascale Analytic Appliances

One of the challenges facing R applications in the data analytics space is fitting the entre problem into system memory. Applications with “larger than memory“ footprints cannot be run on a single server. A similar challenge faces the Spark user because Spark relies on in-memory processing. In both cases the memory capacity of a server limits performance, application size, and complexity.

The Numascale in-memory Analytics Appliance presents a simple solution to this problem. These appliances are designed to run the entire data analytics workload completely in-memory. This approach provides significant speed-up compared to traditional disk-based Hadoop clusters and data warehouses.

Through Numascale’s unique shared memory interconnect technology, Numascale systems can scale from 768GB RAM with 192 x86_64 cores to 12TB RAM with 3072 x86_64 cores running as a single shared memory server. There is no cluster of operating systems or nodes to manage. The design provides data scientists and power users with supercomputer resources and a PC desktop-like experience, with thousands of cores and terabytes of RAM.

The software stack on top of the Numascale computing infrastructure comes with a suite of open source analytics software. The unique scalable in-memory computing architecture speeds up Spark’s in-memory analytics engine, and serves R’s hunger for memory. An optimized version of the MonetDB OLAP database is offered for real-time analytics data markets.

Numascale’s R Appliance

R is the world’s most popular statistical programming language and environment. The Numascale R Appliance provides R applications with all the memory they need to achieve the massive data set computations that are not possible on clustered systems. The R Appliance is configured for optimal R-based analytics, taking into account R memory requirements, CPU cores, and hard disk resources configured with RAID6/LVM for balanced IO/storage performance. The Numascale R Appliance includes Revolution Analytics’ R Open or R Enterprise for maximum performance, as well as RStudio, pre-installed and configured.

Numascale’s Spark Appliance

Apache Spark is a fast and general engine for large-scale in-memory data processing. Both its new processing methods as well as its design allow it to perform many of the Hadoop MapReduce operations used in traditional data analytics. Numascle’s Apache Spark Appliance is significantly faster than a disk-based Hadoop cluster running MapReduce jobs, which leads to TCO savings in terms of server infrastructure costs, software license and support costs, and cluster care and management costs. Spark also provides APIs in Scala, Java, and Python.

Numascale’s Database Appliance

The MonetDB column-store database integrated with R is the ideal platform for a database-centric analytics platform. MonetDB takes maximum advantage of in-memory processing, such as that found in the Numascale Database Appliance, and innovates at all layers of a DBMS. The Numascale team is working closely with the MonetDB team to further optimize MonetDB and MonetDB/R for Numascale architecture.

Numascale Data Analytics

Figure 1: The Numascale Data Analytics appliance offers unique and cost-effective huge memory capability to many important analytics tools.