December 4, 2020

Googlers Speak Out on the Scourge of ML Underspecification

Oliver Peckham

(Michael-Traitov/Shutterstock)

A few days ago, 40 authors (all but a handful hailing from Google) published a 59-page paper. The topic at hand: why so many machine learning models, borne out by internal testing, proceed to then fail spectacularly in real-world applications. The answer, the Googlers say, is underspecification – a blight on machine learning that, they stress, requires substantive solutions.

“An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain,” they write. In plain language: an underspecified model can think of a bunch of reasonably accurate explanations for why a dataset looks the way it does. The problem comes in when researchers assume that all of those explanations are equivalently valid based solely on the model’s training results, without accounting for real-world factors that may have escaped the model’s training process. In those situations, the authors say, “ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains.”

By way of illustration, the Googlers highlight examples spanning “computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics.” In epidemiology, for instance, they discuss how early data from an epidemic (such as the COVID-19 pandemic) is easily explained by a variety of models that do not substantively account for major factors – such as the gradually diminishing number of susceptible people in an area as an epidemic infects (and then renders immune) larger and larger portions of the populace.

“Importantly, during the early stages of an epidemic … the parameters of the model are underspecified by this training task,” they write. “This is because, at this stage, the number of susceptible is approximately constant at the total population size (N), and the number of infections grows approximately exponentially.”

As a result, they say, “arbitrary choices in the learning process” determine which parameters are deemed most predictive by the model, despite different models predicting “peak infection numbers, for example, that are orders of magnitude apart.”

An example of underspecification in epidemic modeling. Image courtesy of the researchers.

“We have seen that underspecification is ubiquitous in practical machine learning pipelines across many domains,” the researchers write. “Indeed, thanks to underspecification, substantively important aspects of the decisions are determined by arbitrary choices such as the random seed used for parameter initialization.“

So, the question remains: how should researchers address underspecification in the model design process?

“Our findings underscore the need to thoroughly test models on application-specific tasks, and in particular to check that the performance on these tasks is stable,” they write. In fact, they say, the “extreme complexity” of modern ML models makes it more or less certain that most models will be underspecified, and researchers must ensure that the inevitable underspecification “does not jeopardize the inductive biases that are required by an application.”

The authors say that the best approach for resolving the widespread underspecification process will involve designing domain-specific stress tests that accurately represent the challenges a model will face in the real world.

“For example, within the medical risk prediction domain, the dimensions that a model is required to generalize across (e.g., temporal, demographic, operational, etc.) will depend on the details of the deployment and the goals of the practitioners,” they elaborate. “For this reason, developing best practices for building stress tests that crisply represent requirements, rather than standardizing on specific benchmarks, may be an effective approach.”

About the paper

The paper, titled “Underspecification Presents Challenges for Credibility in Modern Machine Learning,” is accessible to the public here.

Google Joins the MLOps Crusade

Staying On Top of ML Model and Data Drift

Applications: Artificial Intelligence

Technologies: Middleware

Sectors: Academia, Healthcare, Science

Vendors: google

Tags: coronavirus, COVID-19, google, machine learning, underspecification

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Googlers Speak Out on the Scourge of ML Underspecification

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Googlers Speak Out on the Scourge of ML Underspecification

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link