Follow Datanami:
July 8, 2019

Program Synthesis Moves a Step Closer to Reality

George Leopold

As data scientists and software developers sort through the plethora of tools and APIs ranging from Python to Apache Spark, automation schemes are emerging to help programmers navigate those tools and the accompanying infrastructure that machine learning and other apps run on.

Among them is an emerging “programming-by-example” approach based on the Pandas library. The framework allows AI and other programmers to specify intent by “synthesizing” a program that comes up with the desired output based on inputs.

That goal is the basis of AutoPandas, a “program synthesis engine” for the popular data science library. Data scientist Ben Lorica noted in a recent blog post that investigators at the University of California at Berkeley’s RISELab recently unveiled new research on AutoPandas aimed squarely at making life a bit easier for harried software developers.

Founded in 2017, RISELab is the successor to AMPLab that produced popular open source technologies like Apache Spark and Apache Mesos. Ion Stoica, co-founder of Databricks, is the director of RISELab.

As AI and machine learning move from research to IT and customers service applications, Lorica noted that researchers driven by the emergence of capabilities like AutoML are building new tools that promise to automate various stages of the machine learning pipeline.

Program synthesis is defined as the task of automatically finding an intent-based program within a programming language. Neural-backed generators such as AutoPandas “are an extremely promising step toward practical program synthesis,” according to Lorica.

For example, programmers could specify a data input and output structure such as data frames. AutoPandas would then automatically synthesize a program that produces the desired output from the given input, Lorica explained.

According to a recent paper on program synthesis, the approach differs fundamentally from traditional compilers used to translate high-level code to a lower-level machine language. By contrast, program synthesizers perform searches to generate a program consistent with intent. That capability is considered nothing less than the “Holy Grail” of computer science, the authors noted.

RISELab’s automation tool uses “program generators” to capture API constraints, thereby winnowing the possible number of programs. It also uses neural network models to predict API calls along with the Ray distributed computing framework designed to scale programmer searches.

Recent items:

‘Data Scientist’ Title Evolving Into New Thing

RISELab Takes Flight at UC Berkeley