Left for Dead, R Surges Again
Don’t look now, but R, which some had written off as a language in terminal decline in lieu of Python’s immense and growing popularity, appears to be staging a furious comeback the likes of which IT has rarely seen.
According to the TIOBE Index, which tracks the popularity of programming languages (as expressed in Web searches), R has risen an unprecedented 12 spots, up from number 20 in the summer of 2019 to number 8 on its list today.
That’s a huge move, particularly in light of the continued domination of Python as the language of choice for data science. A recent report on data science tools by Anaconda found that 75% of data scientists and analyst report using Python “always” or “frequently,” which was by far the most popular language. Only 6% of users reported not using Python, which is quite remarkable when you think about it.
To its credit, R was number two in Anaconda’s ranking, but it wasn’t really close, as only 27% of users reported using R “always” or “frequently.” It’s clear that Python continues to be the preferred language for data science, and not by a small margin.
But in light of the surge in usage detected by TIOBE, Python’s unparalleled success doesn’t necessarily mean the end of the road for R.
R certainly didn’t look good a year ago. In the August 2019 Datanami story “Is Python Strangling R to Death?”, we reported how R had fallen precipitously from the number 8 slot on the list of TIOBE’s most popular languages in January 2018 to number 20 in August 2019. Meanwhile, Python usage continued to increase (although it was not enough to displace perennial heavyweights C and Java on the list).
“Python’s continuous rise in popularity comes at the expense of the decline of popularity of other programming languages,” the folks behind the TIOBE Index wrote in July 2019. “One of these programming languages is R…”
My, what a difference a year makes.
Thanks to a 1.57% surge in the number of searches for R performed on popular search engines (the main driver of TIOBE’s index), R reclaimed the number 8 spot on the July 2020 TIOBE Index.
Paul Jansen, the CEO TIOBE Software, had some thoughts on the remarkable recovery. “Some time ago it seemed like Python had won the battle of statistical programming,” Jansen wrote, “but R’s popularity is still increasing in the slipstream of Python.”
Jansen sees two potential trends that could explain R’s recovery. First, universities and research institutions appear to favor open languages, like R and Python. “The days of commercial statistical languages and packages such as SAS, Stata, and SPSS are over.”
Second, Jansen postulates that the response to the COVID-19 pandemic has stimulated demand for statistical research, and R nicely fits the bill. “Lots of statistics and data mining need to be done to find a vaccine for the COVID-19 virus. As a consequence, statistical programming languages that are easy to learn and use, gain popularity now,” he wrote.
Nick Elprin, the CEO of Domino Data Lab, says his customers prefer to use open source technologies to develop machine learning models within the Domino data science platform.
“Certainly the open source adoption continues to accelerate. Python and R are still the predominant tools that we see,” he told Datanami recently. “Within those ecosystems, we do see more adoption of some of the deep learning tools and packages, like PyTorch and Keras, for example.”
R has been around for decades, so it’s well understood, and there are many statistical packages available for it, including an estimated 10,000 via CRAN. It’s a decedent of S, and was widely taught in universities. But R also suffers from disadvantages. For starters, it’s not the easiest language to learn, and is considered to be more difficult to grasp than Python. It’s also single-threaded, whereas Python is multi-threaded. R needs to store the entirety of the data object in memory, which limits its usefulness when the data gets really big. Python doesn’t’ suffer from the same shortcomings.
R once was the preferred language for data science tasks. In 2016, a Burtch Works survey found 42% of analytics professionals preferred R, followed by SAS at 39%, and Python at 20%. Python wasn’t even on Burtch Works’ radar in 2015, and it was forced to include Python because so many folks who took the survey wrote it in. Python has dominated every survey since.
It’s a remarkable comeback for R, to be sure, but it remains to be seen whether R can keep it up. There doesn’t appear to be any stopping the juggernaut Python, which continues to build momentum and today is the defacto standard language for advanced analytics and machine learning. Change seems to be the only constant in big data and data science, but data scientists do like having choices, and for now, R appears to be a solid number two to Python.