August 22, 2018

Popping Big Data Fallacies On the Edge

Alex Woodie

(Risto Viita/Shutterstock)

Organizations today are drowning in data. There’s no argument about that. But there continues to be vigorous debate on the best way to deal with that data. While some advocate creating big data lakes to store data that will subsequently be used for training machine learning models, there’s a growing chorus of voices calling for a simpler and more real-time approach.

You can count Simon Crosby, CTO of SWIM.ai, among proponents for a lighter-weight and less expensive approach to data collection and analysis, at least for a certain class of real-world machine learning problems at the edge. During a recent conversation with Datanami, Crosby threw cold water on the notion that uploading data to the cloud for storage and machine learning was the best way to get value out of the morasses of data created on edge devices.

“There is this kind of fiction out there advanced by Microsoft, Google, and Amazon that everybody is going to get data scientists and build some sort of machine learning model and push it to the edge,” Crosby says. “It’s just nonsense. It’s just not going to happen.”

The former Citrix Systems CTO is a firm believer in an alternative model that’s based on incremental learning and a stateful recording of the world. Instead of sending huge amounts of data to the cloud and incurring the overhead of database I/O and GPU training time – not to mention a hefty telecom bill or a data scientist’s salary– Crosby makes a good argument that most of the value can be wrought from sensor data close to where it was created and in real time.

“SWIM.AI’s approach to this is we learn on data on the fly,” he says. “It just needs to be exposed to a large amount of streaming data and we self-train using very compact models at the edge, and deliver insights at the edge.”

SWIM.ai generates insights on the edge using incremental learning

The company’s SWIM EDX runs on low-power edge devices, such as Arm CPUs, Raspberry Pis, and Nvidia Jetson TX2s. Instead of uploading sensor data to cloud data repositories or other centralized data lakes in order to train machine learning models, the SWIM.AI EDX offering uses a small neural network to do training onsite, via a “digital twin” that’s maintained in a few dozen MB of memory.

“There’s a vibrant field of learning called incremental learning where, if a data source has certain statistical properties, then you learn continually,” Crosby explains. “The difference between how the real world turns out and your guess is your error. You back propagate your error through the network.”

The lightweight approach doesn’t work for all types of data or machine learning workloads. For instance, many natural language processing problems would require you to maintain a large corpus of data for training. If you want to create predictions for people’s behavior, it’s best to have a large database that describes the behavior of people over time.

But for many edge use cases that involve time-series data — such as understanding traffic patterns in a city or predicting equipment failures in the field — incremental learning can be a big time- and money-saver for those suffering under the weight of big data, Crosby says.

Crosby uses the City of Palo Alto’s traffic light network – an actual SWIM.ai customer that generates 4TB of traffic data per day from thousands of sensors – to demonstrate the math behind a cloud-based data processing model and the SWIM.AI approach that processes the data in a city data center.

“If I take all the data from a city and bring it up to Amazon and use Lambda to try to learn on it, essentially what happens is this: Every second, every sensor in that infrastructure is giving me an update, and I have to go and hit the cloud, which is going to read the previous state of this thing out of a database, and write the new state. End to end, for every update, is several hundred milliseconds,” he says.

Nvidia’s Jetson TX1

“Meanwhile, on the edge using just some generic Arm processor — we’re funded by Arm so let’s just make it an Arm CPU — I’m running on a nanosecond timescale. I’ve already got close to a billion free CPU cycles just form the idle time that it would have taken to [run the Lambda calls] on the cloud. So we can scavenge billions of free CPU cycles at the edge. If I wanted to use an Nvida Jetson TX2, a 64-bit Arm processor and a small GPU, that $200 board solves me the entire edge learning problem for several cities, versus about $5,000 a month for AWS” in Lambda calls alone.

The three-year-old startup has provided similar economic benefits to an aircraft component manufacturer that was struggling to read all the RFID tags in real time. The company maintains 2,000 RFID readers recording 500 reads per second in a manufacturing plant that’s a mile long, and they were “paying Oracle through the teeth for logging all these reads,” Crosby says.

Instead of logging all the RFID reads into the relational database, the SWIM.ai solution utilized a series of digital twins running on two Raspberry Pi devices that maintained the state of all the components attached to the RFID tags. “Suddenly this customer has gone from dying of tag reads going into the database, which they have to process later, to being able to watch wheel sub-assemblies come together in real time and predict when they’re going to be done.”

As the speed and density of edge computing devices goes up, the need to shunt data into the cloud for after-the-fact analysis will decline, Crosby predicts. That’s not true for every use case, but there a strong argument can be made for utilizing a more intelligent technique for edge cases on the Internet of Things (IoT).

“Why are we storing all this stuff?” Crosby asks. “Because 10 years ago, Cloudera came out of Google and said, ‘People, you need to store all your data.’ Other than that, we wouldn’t be doing it. It’s just not necessary.”

SWIM.ai was founded by Rusty Cumpston and Chris Sachs in 2015. The San Jose, California-based company has received $11 million in funding, including an investment by Arm in a $10-million Series B round that closed last month.

Why Data Scientists Should Consider Adding ‘IoT Expert’ to Their List of Skills

Collecting and Managing IoT Data for Analytics

Applications: Predictive Analytics

Technologies: Cloud

Sectors: Manufacturing

Vendors: Amazon, ARM, Citrix, Cloudera, google, Microsoft, SWIM.ai

Tags: edge computing, incremental learning, iot, machine learning, neural network, SWIM.ai

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Popping Big Data Fallacies On the Edge

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 17, 2024

April 16, 2024

April 15, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Popping Big Data Fallacies On the Edge

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 17, 2024

April 16, 2024

April 15, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link