June 22, 2016

MIT Uses Video to Train Machine Vision System

George Leopold

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory reported this week they have developed a deep learning algorithm that could help machines using predictive vision anticipate human interactions. The approach uses unlabeled YouTube videos as its source material to train deep networks to predict human interactions.

In a paper titled, “Anticipating Visual Representations from Unlabeled Video,” the researchers said they applied recognition algorithms on the trained network’s prediction to forecast future actions. That prediction capability could be used in applications such as training robots to understand that a greeting in the form of a wave could lead to a handshake or an embrace.

“The capability for machines to anticipate future concepts before they begin is a key problem in computer vision that will enable many real-world applications,” the MIT researchers asserted. “We believe abundantly available unlabeled videos are an effective resource we can use to acquire knowledge about the world, which we can use to learn to anticipate future.”

The key to their approach was that readily available unlabeled video could be used to train deep neural networks to predict visual representations. Those images “are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute,” the MIT team noted.

They then applied recognition algorithms to the predicted representations to anticipate future actions and objects. The researchers said they validated experimentally that actions could be predicted one second into the future while objects could be estimated about five seconds ahead.

The investigators also said their work builds on previous “big visual data” such as millions on online images used to build object and scene recognition systems. They then took this approach one step further by mining information from online video with the goal of developing a training model for machine vision used to anticipate subtle semantic concepts.

The fact that training data did not have to be labeled meant there was ample online video from popular TV shows that could be used in the researchers’ proposed deep regression networks. They then used a relatively small set of labeled examples from a desired task to indicate specific categories of human interaction. When predictions matched the recognition system, the researchers said they applied standard recognition algorithms to the predicted representation in order to forecast a category.

“There’s a lot of subtlety to understanding and forecasting human interactions,” noted lead MIT researcher Carl Vondrick. “We hope to be able to work off of this example to be able to soon predict even more complex tasks.”

Outside experts said MIT’s approach is a good start, but practical applications still years away. “This experiment represents a good scenario to apply machine learning because it uses a small set of input and possible output, so the investment in training is reasonable,” noted Marco Varone, founder and CTO of Expert System, a cognitive computing specialist.

“It is important to note that real world scenarios are often more complex, such as the case in our sector of natural language understanding,” Varone added. “Here, the costs and resources required are significantly higher and so is the complexity, requiring skills not always accessible to organizations. So we while we certainly welcome this research, the real use in the real world is still years away.”

Recent items:V

Twitter Buys Another Machine Learning Startup

Unleashing AI With Human-Assisted Machine Learning

Applications: Artificial Intelligence, Data Mining, Research Analytics

Technologies: Frameworks

Sectors: Academia, Biosciences

Vendors: Expert System

Tags: AI, deep learning, machine vision, MIT, recognition algorithms

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

MIT Uses Video to Train Machine Vision System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

MIT Uses Video to Train Machine Vision System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link