Follow Datanami:
August 15, 2019

Is Python Strangling R to Death?

As programming languages go, there’s no denying that Python is hot. Originally created as a general-purpose scripting language, Python somehow became the most popular language for data science. But is Python’s fame coming at the expense of R? Yes, according to some folks in the IT industry, who say R is a dying language.

There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged by more than 3% to achieve a 10% rating (TIOBE’s proprietary metric that primarily measures search activity), easily the biggest gain among the 20 most popular languages.

R, by contrast, has not fared well lately on the TIOBE Index, where it dropped from 8th place in January 2018 to become the 20th most popular language today, behind Perl, Swift, and Go. At its peak in January 2018, R had a popularity rating of about 2.6%. But today it’s down to 0.8%, according to the TIOBE index.

“Python’s continuous rise in popularity comes at the expense of the decline of popularity of other programming languages,” the folks behind the TIOBE Index wrote in July. “One of these programming languages is R, but Perl has been beaten even more.”

R’s peak popularity occurred in January 2018, according to the  TIOBE Index

This led some pundits to declare the demise of R. Dice Insights, an online publication connected to the popular tech salary site, declared that R was one of five languages that are “probably doomed” in this July article.

“Although R is still used by academics and data scientists, companies interested in data analytics are turning to Python for its scalability and ease of use,” writes Nick Kolakowski, a senior editor with Dice Insights. Relying on usage by a “handful of academics and nobody else” may not be enough to keep R alive, he wrote. “That’s not viable.”

Other data points suggest Python’s success over the years has come at the expense of R and SAS, the popular proprietary analytics environment. As part of its salary survey, Burtch Works has been asking data scientists and analysts which environment do they prefer: R, Python, or SAS.

In 2014 and 2015, SAS still dominated but R was quickly gaining steam. From 2016 to 2018, however, Python snaked its way into the mix. And in 2018, Burtch Works declared a three-way dead heat among Python, R, and SAS. Other polls have documented similar movements.

Python clearly has the momentum, but Martijn Theuwissen, the co-founder of DataCamp, rejects the assertion that R is dead or dying. “Reports of R’s decline are greatly exaggerated,” he says. “If you look at the growth of R, it’s still growing. Based on what I observe, Python is growing faster.”

Interest in R was surging when Theuwissen founded DataCamp in 2013 with Dieter De Mesmaeker and Jonathan Cornelissen. In fact, R was the core focus at DataCamp, which provides education and training in data science, data analysis, and machine learning. Since then, interest in Python has exploded, and today DataCamp offers courses in Python and R, in addition to Scala and SQL.

Python, R, and SAS were in a dead-heat in 2018 according to Burtch Works survey of preferred modeling environments

“When we started the company six year ago, R for data science saw massive growth,” Theuwissen says. “We didn’t hear a lot of Python. But two years in [in 2015], we heard Python being used more and more for data science.”

Trying to measure the popularity of languages is a notoriously difficult task. While languages do have a natural life, there’s no foolproof way to pinpoint where they are on the lifecycle at any particular point. And of course, there’s no way to predict the future with anything approaching full certainty (not even in MATLAB, currently #12 on TIOBE).

When Guido van Rossum first conceived of Python back in the late 1980s, the goal was not to create the world’s most popular language for data science. Who could have predicted that Python would get a mid-life bump to become the lingua franca for data science and machine learning?

In contrast to Python, R was conceived to be a language for statistical computing. From the get-go in the early 1990s, R creators Ross Ihaka and Robert Gentlemen of the University of Auckland sought a way to marry the structures of S with a good user interface to allow academics, engineers, and others to build statistical models and analyze data.

“We both had an interest in statistical computing and saw a common need for a better software environment in our Macintosh teaching laboratory,” Ihaka wrote in his 1998 paper “R: Past and Future History.” “We saw no suitable commercial environment and we began to experiment to see what might be involved in developing one ourselves.”

R has spread far beyond New Zealand over the past three decades. There are currently more than two million users of R around the world, according the R Consortium, a group created to promote the use of the open source language. Developers have written and open sourced more than 13,000 libraries via CRAN to automate a variety of statistical tasks and plotting graphs.

“A broad range of organizations have adopted the R language as a data science platform, including biotech, finance, research and high technology industries,” the R Consortium says on its website. “The R language is often integrated into third-party analysis, visualization and reporting applications, and runs on a wide variety of computing platforms.”

One of R’s benefits is that it’s widely taught in colleges and universities. Many graduate students pursuing scientific degrees in a number of disciplines learn R for statistical modeling. As demand for data scientists grew, many of these individuals trained in “hard” sciences parlayed their statistical abilities into the new data science trade, and brought with them their R knowledge.

Python is also taught in higher education, but it’s more apt to be found in the computer science department than astronomy or wildlife biology. As these college graduates joining the workforce and become software engineers, Python is one of the languages in their quivers. And when these folks transition into data science roles, it’s only natural they lean more heavily on Python.

In a Reddit discussion titled “Is R a dead end street?” individuals compare and contrast the various technical benefits of R versus Python. One theme that appears repeatedly is that, while users may be able to accomplish just about any statistical task natively within R or one of its libraries, there’s concern the language just hasn’t kept up with Python, particularly when it comes to working within a Web browser. (Perceived limits to R’s scalability is another common theme among R’s detractors.)

Python, on the other hand, has benefited greatly from the availability of statistical libraries like Numpy, and is at home when used within a browser, some Reddit users wrote. Python is readily used for modeling within data science notebooks such as iPython and Jupyter. R can also be used within data science notebooks like Jupyter, but Python is the default mode.

DataCamp’s Theuwissen says the Python ecosystem is outgrowing the R ecosystem. Since Microsoft acquired Revolution Analytics in 2015, the number of data science and analytic software vendors focusing on R has shrunk.

(Image courtesy Qubole)

“You have R Studio, and other than that, I don’t think there are large companies driving the language forward,” he says. “With Python, there’s a lot of people contributing to the ecosystem.”

Microsoft still develops and supports the R software that it obtained with its Revolution Analytics deal, in addition to distributing an open version called Microsoft R Open. However, the core R product no longer bears the R name, and Python is the reason why. “The product was renamed from R Server to Machine Learning Server to reflect the addition of Python-based analytics,” the company wrote in a blog post last month.

The data science tent is large and varied. While it has clearly stated an affinity for Python in the last couple of years, that doesn’t diminish the contributions of other languages or the potential for individual to do good work in other environments.

Python’s future may be brighter at this particular time than R’s, but that doesn’t mean R has no future. Python may be the best tool for some data science jobs, but for others, it’s hard to beat R.

Related Items:

Python Gains Traction Among Data Scientists

Which Programming Language is Best for Big Data?

Python Eats Into R as SAS Dominance Fades

 

Datanami