October 5, 2015

Big Payoff for Big Data: IoT in Manufacturing

Tools of the Big Data Trade

Using the right tools for big data projects is important. The most well-known open source analytics tool is Apache Hadoop — although in reality, the Hadoop ecosystem is actually a collection of tools and frameworks that are used to build big data applications.

As described in the Intel Blueprint, data are collected through a variety of means and stored in a Hadoop HDFS file system. From there, the data can be analyzed using a variety of tools.

In terms of shop floor analytics, one of the most important metrics is the speed at which data can be analyzed and insights can be realized. In a traditional Hadoop batch environment, job turnaround time is often not nearly as critical as it is in a production environment. Two tools that have shown great utility in ease of use as well as rapid response are the R language and Apache Spark. R has a long history as a high-level, easy-to-use data analysis and statistics tool. Spark, on the other hand, is somewhat new to the analytics market, but offers a form of in-memory computing that provides dramatic speed increases for analytic jobs.

Big Memory for Big Data: Numascale Analytic Appliances

One of the challenges facing R applications in the data analytics space is fitting the entre problem into system memory. Applications with “larger than memory“ footprints cannot be run on a single server. A similar challenge faces the Spark user because Spark relies on in-memory processing. In both cases the memory capacity of a server limits performance, application size, and complexity.

The Numascale in-memory Analytics Appliance presents a simple solution to this problem. These appliances are designed to run the entire data analytics workload completely in-memory. This approach provides significant speed-up compared to traditional disk-based Hadoop clusters and data warehouses.

Through Numascale’s unique shared memory interconnect technology, Numascale systems can scale from 768GB RAM with 192 x86_64 cores to 12TB RAM with 3072 x86_64 cores running as a single shared memory server. There is no cluster of operating systems or nodes to manage. The design provides data scientists and power users with supercomputer resources and a PC desktop-like experience, with thousands of cores and terabytes of RAM.

The software stack on top of the Numascale computing infrastructure comes with a suite of open source analytics software. The unique scalable in-memory computing architecture speeds up Spark’s in-memory analytics engine, and serves R’s hunger for memory. An optimized version of the MonetDB OLAP database is offered for real-time analytics data markets.

Numascale’s R Appliance

R is the world’s most popular statistical programming language and environment. The Numascale R Appliance provides R applications with all the memory they need to achieve the massive data set computations that are not possible on clustered systems. The R Appliance is configured for optimal R-based analytics, taking into account R memory requirements, CPU cores, and hard disk resources configured with RAID6/LVM for balanced IO/storage performance. The Numascale R Appliance includes Revolution Analytics’ R Open or R Enterprise for maximum performance, as well as RStudio, pre-installed and configured.

Numascale’s Spark Appliance

Apache Spark is a fast and general engine for large-scale in-memory data processing. Both its new processing methods as well as its design allow it to perform many of the Hadoop MapReduce operations used in traditional data analytics. Numascle’s Apache Spark Appliance is significantly faster than a disk-based Hadoop cluster running MapReduce jobs, which leads to TCO savings in terms of server infrastructure costs, software license and support costs, and cluster care and management costs. Spark also provides APIs in Scala, Java, and Python.

Numascale’s Database Appliance

The MonetDB column-store database integrated with R is the ideal platform for a database-centric analytics platform. MonetDB takes maximum advantage of in-memory processing, such as that found in the Numascale Database Appliance, and innovates at all layers of a DBMS. The Numascale team is working closely with the MonetDB team to further optimize MonetDB and MonetDB/R for Numascale architecture.

Figure 1: The Numascale Data Analytics appliance offers unique and cost-effective huge memory capability to many important analytics tools.

¹http://www.intel.com/content/dam/www/program/embedded/internet-of-things/blueprints/iot-business-value-manufacturing-blueprint.pdf

Big Payoff for Big Data: IoT in Manufacturing

Tools of the Big Data Trade

Big Memory for Big Data: Numascale Analytic Appliances

Numascale’s R Appliance

Numascale’s Spark Appliance

Numascale’s Database Appliance

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

June 26, 2025

Sponsored Partner Content

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Big Payoff for Big Data: IoT in Manufacturing

Tools of the Big Data Trade

Big Memory for Big Data: Numascale Analytic Appliances

Numascale’s R Appliance

Numascale’s Spark Appliance

Numascale’s Database Appliance

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

June 26, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link