Is Data Science the Fourth Pillar of the Scientific Method?
Nvidia CEO Jensen Huang revived a decade-old debate last month when he said that modern data science (AI plus HPC) has become the fourth pillar of the scientific method. While some disagree with the notion that statistical analysis alone can reveal undiscovered laws, the argument may be moot if data science continues its current course being an extremely useful and in-demand tool for all manners of scientific discovery.
Huang made his statement about data science being the fourth pillar of science during his keynote address at the Nvidia GPU Technology Conference (GTC) 2019 show, which attracted several thousand people to the San Jose State University event center.
“Data science is the fastest growing field of computer science today,” Huang said. “Because of several different factors, it has become the fourth pillar of the scientific method.”
The first pillar of the scientific method is the experimental method, such a Isaac Newton’s Laws of Motion. The second pillar is the theoretical. An Albert Einstein thought experiment is an example of this form of scientific discovery.
The third pillar of the scientific method, per Huang, is computer simulation. Scientists developing advanced models that describe molecular dynamics are an example of this type of scientific endeavor.
“And now we have a data-driven, data-science method, and it’s made possible because of three factors,” Huang said. Those three factors are the generation of big data, breakthroughs in machine learning and deep learning algorithms, and high performance computers.
“The availability of data, machine learning algorithms, and HPC has made is possible for us now to use this as the fourth pillar of scientific discovery,” he said. “These three factors continuously feed on each other and now data science is a pillar of the scientific method…We’re solving problems that were just previously impossible.”
Huang’s words echoed the content of a 2009 book, titled “The Fourth Paradigm: Data-Intensive Scientific Discovery,” which was published by Microsoft Research. The book is a collection of essays that expand on the ideas of Jim Gray, a pioneering computer scientist with Microsoft Research who disappeared at sea in 2007.
In the foreword, Microsoft Researcher Emeritus Gordon Bell reminds us that it was Tycho Brahe’s assistant, Johannes Kepler, who discovered the laws of planetary motion by analyzing Brahe’s astronomical data. “This established the division between the mining and analysis of captured and carefully archived experimental data and the creation of theories,” Bell wrote. “This division is one aspect of the Fourth Paradigm.”
One thinks that Gray might be pleased at the progress that has been made in the development of tools for capturing, curating, analyzing, and visualizing data, which he identified as the areas of biggest needs for research in one of his last public keynote addresses in 2007 (that keynote was adapted for the first essay in “The Fourth Paradigm”).
“…[A]lmost everything about science is changing because of the impact of information technology,” Gray wrote. “Experimental, theoretical, and computational science are all being affected by the data deluge, and a fourth, ‘data-intensive’ science paradigm is emerging. The goal is to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other.”
But not everybody agrees that data science is the fourth pillar of scientific method. One of those is Oliver Schabenberger, the COO, CTO, and executive vice president at data analytics giant SAS, and who previously taught statistics at Michigan State University and Virginia Tech.
“I’m a scientist by training, so the scientific method, which I taught my students, is near and dear to my heart,” Schabenberger told Datanami in an interview. “I don’t see data science as the fourth pillar of science. I see this as a method of getting insight from data. We called it statistics before. We called it data mining before. So what is essentially different?”
According to Schabenberger, not much. nothing has changed much, except for the methods that we use to find insights. “It’s the types of the methods we use [that changed], the size of the data that’s different, and it’s the way we think about data that’s changed,” he said.
While data science tools have come a long way, we should be careful not to assign too much value to the results of the analysis, he said. It’s important to have
“I think we’re losing a little bit,” Schabenberger said. “We’re losing it for thinking and understanding of uncertainty, of confidence. Nobody is talking about precision or standard error anymore. Everybody is just talking about misclassification rates. Those systems are statistically impressive, but they can also be individually highly unreliable.”
Ben Lorica, the chief data scientist at O’Reilly Media, also expressed some skepticism about data being the fourth pillar of the scientific method.
“I would say that the use of data in machine learning will be useful for scientists, but that doesn’t’ remove the need for causal explanations and theory,” Lorica told Datanami in an interview at the recent Strata Data Conference. “It might point you in the right direction, where you should be spending more of your time to find the theoretical foundation…. It can point you. But at the end of the day, it will be just a pointer.
Lorica said deep learning itself “also needs a lot of theoretical work, when and why it works and doesn’t work.” “The heartening thing there are people who are theoretical scientists who are now engaged in machine learning, so they will start telling us the limitations of these approaches.”
There are obviously differences of opinion whether data science is the fourth pillar of the scientific method. But interestingly, there are some folks in the scientific community who question whether there are any more than two.
Back in 2010, following the release of “The Fourth Paradigm,” Moshe Vardi, the editor in chief of Communications of the Associations of Computing Machinery (ACM) magazine penned a letter questioning the idea that data mining is the fourth leg.
“I find myself uncomfortable with science sprouting a new leg every few years,” Vardi wrote. “In fact, I believe that science still has only two legs—theory and experimentation. The ‘four legs’ viewpoint seems to imply the scientific method has changed in a fundamental way. I contend it is not the scientific method that has changed, but rather how it is being carried out.”