New Open Source App: Data Science Education
The principles underpinning open source software development that are transforming the digital economy are now being extended to new sectors such as education, where proponents hope to leverage the collaborative approach to advance the teaching of data science.
An open source project shepherded by the Linux Foundation aims to accelerate data science curricula while benefitting from the contributions of students and teachers. OpenDS4All is funded by IBM (NYSE: IBM) and is being developed by the University of Pennsylvania. The effort would give educators free access to information needed to develop data science coursework. In return, successful approaches would be folded back into what project promoters call “constantly evolving and improving” curricula.
A starter “curriculum kit” includes a set of open source building blocks that could be used to launch data science programs. Based on the Python programming language, the tools and frameworks include code, documentation and data sets, organizers said.
The theory behind OpenDS4All is that those materials can be used by anyone, improving them along with way.
OpenDS4All is part of a Linux Foundation project called ODPi that seeks to create open source standards for data governance, business intelligence and data analytics.
The open source initiative seeks to go beyond online university courses by combining lectures with classroom activities and practical data science assignments. Along with access to data sets and code, the curriculum builder includes sample Jupyter notebooks and other materials that can be used to expand data science and data engineering education.
Initial modules will be aimed at undergraduate and graduate students. According to the OpenDS4All GitHub page, “The expectation is that instructors will be generally fluent in basic database and machine learning concepts.”
Data science students must demonstrate proficiency in programming with Python along with familiarity with probability theory and statistics.
Among the data science categories to be covered are: data wrangling and integration; exploratory data analysis; data and knowledge modeling; scalable data processing; machine learning; model assessment; and ethics.
Organizers said other universities have expressed interest in the open source education initiative, either as a means of launching new data science programs or embedding materials into existing computer science programs.
“Shared education is an emerging and important frontier for open source,” said John Mertic, the Linux Foundation’s director of program management.
Zachery Ives, chair of the university’s Computer and Information Science Department, added that the open source initiative reflects how data science is “rapidly evolving as a new academic discipline [tying] together ideas from many different sub-areas,” including foundational computer science and data analytics applications.
The open source approach is the latest attempt to meet a widening skills gap and growing demand for qualified data scientists. Recent surveys show that corporate demand continues to outstrip supply, with median base salaries for data scientists estimated at $108,000.
Recent items:
How Academia Could Help Close the Technology Skills Gap