Follow Datanami:
July 6, 2020

Data Science Team at Columbia to Enhance Probabilistic Programming

July 6, 2020 — A Columbia University research team affiliated with the Data Science Institute (DSI) has received a Facebook Probability and Programming research award to develop static analysis methods that will enhance the usability and accuracy of probabilistic programming.

The team includes Jeannette M. Wing, DSI’s Avanessians Director and Professor of Computer Science; Andrew Gelman, Professor of Statistics and Political Science and DSI member; and Ryan Bernstein, a doctoral student in computer science who is co-advised by Wing and Gelman. The three will conduct a static analysis of Stan, an open-source probabilistic language program developed mainly at Columbia that describes statistical models. Their analysis will make it easier for users to reliably design statistical and machine learning models in high-level programming languages, according to Gelman, who is a co-principal investigator on the award.

“Stan is used in applications ranging from drug development [for Novartis] to political polling and forecasting [for YouGov and The Economist],” Gelman said. “As the computational counterpart to Bayesian inference, probabilistic programming is an effective approach for computing under uncertainty, which is why it’s increasingly used in so many fields and why we are hoping to expand its capabilities in this project.”

Specifically, the team intends to draw upon probabilistic thinking to develop two tools. One tool will automate a step in the Bayesian model checking workflow, which will allow users to catch inconsistencies in their models without having to write extra code. A second tool will make the compiler more reliable by verifying its internal code transformations and warning users of common statistical and programming pitfalls.

Bernstein expects this research to help lower the level of expertise required to get high quality, trustworthy results. “We hope to both expand the user base of people who can write statistical and machine learning models in a high-level programming language, and provide increased assurance to these users that their probabilistic programs have the intended behavior,” he said.

In January 2020, Facebook launched the request for proposals designed to support research that addresses fundamental problems at the intersection of machine learning, programming languages, statistics, and software engineering.

“While it was expected that the proposals would be excellent scientific directions in their own right, it is great they also are on topics of current interest to us from a more applied point of view,” said Satish Chandra, a software engineer at Facebook. “We look forward to having rich technical exchanges with each of the project teams.”

About Data Science Institute at Columbia University

The Data Science Institute at Columbia University trains the next generation of data scientists and develops innovative technology to serve society. With more than 350 affiliated faculty working in a wide range of disciplines, the Institute fosters collaboration in advancing techniques to gather and interpret data and addresses urgent problems facing society. The Institute also works closely with industry to bring promising ideas to market.


Source: Data Science Institute at Columbia University

Datanami