May 13, 2022

MIT Advances Unsupervised Computer Vision with ‘STEGO’

Oliver Peckham

Training machine learning models often means working with labeled data. For computer vision tasks, this might look, for instance, like an hour of camera footage from a car, meticulously sectioned by humans to designate roads, road signs, vehicles, pedestrians and so forth. But labeling even this small amount of data could take hundreds of hours for a human, bottlenecking the training process. Now, researchers from MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) are introducing a new, state-of-the-art algorithm for unsupervised computer vision tasks that operates without any human labels.

The model is called STEGO, short for “Self-supervised Transformer with Energy-based Graph Optimization.” STEGO is a semantic segmentation algorithm, the process of labeling the pixels in an image. Historically, semantic segmentation has been easiest for discrete objects like people or vehicles and harder for more amorphous, blended elements of the environment like clouds or bushes—or cancers.

“If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” explained Mark Hamilton, a research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author of the paper describing STEGO, in an interview with MIT’s Rachel Gordon. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”

STEGO is built on top of the DINO algorithm, itself trained on 14 million images. The researchers tested STEGO on a variety of test cases, including the incredibly diverse COCO-Stuff image dataset. The researchers reported that STEGO doubled the performance of prior unsupervised computer vision models on the COCO-Stuff benchmark, and performed similarly well on tasks like driverless car datasets and space imagery datasets.

“In making a general tool for understanding potentially complicated datasets, we hope that this type of an algorithm can automate the scientific process of object discovery from images,” Hamilton said. “There’s a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of datasets. Since you don’t need any human labels, we can now start to apply ML tools more broadly.”

Filling Cybersecurity Blind Spots with Unsupervised Learning

MIT Researchers Leverage Machine Learning for Better Lidar

Applications: Artificial Intelligence

Technologies: Middleware

Sectors: Academia

Tags: computer vision, csail, MIT, unsupervised learning

MIT Advances Unsupervised Computer Vision with ‘STEGO’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

Sponsored Partner Content

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

MIT Advances Unsupervised Computer Vision with ‘STEGO’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link