NSF Doubles Down on Data Science
The National Science Foundation has awarded two grants to the University of California at Berkeley, the first to deepen the theoretical foundations of data science, the second addressing the big data skills gap.
The first NSF awarded supports creation of Foundation of Data Analysis (FODA) Institute that brings together basic research on applied mathematics, theoretical computer science and theoretical statistics, the university announced.
The award also will fund two national data science workshops designed to develop curriculum “anchored in the actual practice of data science work,” organizers said. The curricula materials, including course modules and exercises, will be publicly available.
The FODA Institute will tackle four “deep theoretical challenges,” organizers said. The first seeks to pursue and leverage a general complexity theory for inference. Improved inference tools could, for example, be applied to predictive analytics.
Other projects include “the power of stability as a computational-inferential principle,” using randomness as a resource in computational mathematics and the “principled” combination of science-based with data-driven models.
“Each of these challenges is situated squarely at the interface of theoretical computer science, theoretical statistics and applied mathematics, and the project will attempt to bridge the underlying interdisciplinary gaps to address some of the most important questions at the heart of data science today,” principal investigator Michael Mahoney noted in a statement.
The NSF data science award extends for three years. A phase one initiative is called Trans-disciplinary Research in Principles of Data Science. Twelve new research centers will be funded with a $17.7 million investment from NSF. The centers will compete for additional NSF funding during the program’s second phase.
Meanwhile, the data science curriculum award taps into existing data science programs at UC-Berkeley, including the Berkeley Institute for Data Science. The effort to extend the curriculum of the university’s Division of Data Sciences will help meet “growing demand for deepened expertise in data science foundations, methodologies and applications will define a new generation of data scientists,” the school said in announcing the NSF awards.
The Berkeley awards are part of a wider NSF effort called “Harnessing the Data Revolution.” The multi-disciplinary approach seeks among other things to support basic research in math, statistics and computer science to “enable data-driven discovery through visualization, better data mining [and] machine learning,” the agency said.
The NSF funding and the creation of the university’s data analysis institute anticipate the emergence of data science as a field that advances other areas of science and engineering. “Data science is emerging as a field in its own right,” said David Culler, interim dean of UC-Berkeley’s Division of Data Sciences. “New applications in a diverse range of disciplines will augment data science foundations as modern research becomes more data-intensive and data-rich.”
Among the analytics tools emerging from UC-Berkley is Apache Spark, which was created by the school’s Algorithms, Machines and People Lab. The team that created Spark went on to form Databricks.