R and Python: The Data Science Dynamic Duo
The language R is in the midst of a sizzling resurgence this summer. One might hypothesize that this growth is coming at the expense of Python, by far the dominant language for data science. But some evidence suggests that data scientists are increasingly using both.
“Rather than R versus Python, we focus on R and Python,” says Lou Bajuk, director of product marketing for RStudio, the Boston, Massachusetts-based provider of commercial and open source R software.
The folks at RStudio watched as the reports rolled in last year about the apparent demise of R. The TIOBE Index, which uses search terms and other measures to gauge the popularity of languages, reported that R dropped from the 8th most popular language in January 2018 to number 20 in July 2019.
That drop coincided with a surge behind Python, which the folks at TIOBE attributed to a decline in R (as well as a drop in Perl interest). Some in the IT industry speculated that R was a dying language. Dice Insights declared that R was “probably doomed.”
But R regained that lost ground as quickly as it lost it. By July 2020, R was soaring in popularity according to the TIOBE Index, which placed R in the number 8 spot, right where it was before it started its apparent plunge.
Bajuk never put much stock in the R doomsayers. “I had it in my notes ‘Don’t call it a comeback,” he tells Datanami. “From our perspective, R never went anywhere. Yes, by some metrics [the language declined in popularity]. But from our perspective, we’ve seen R consistently growing. Demand has been consistently strong.”
RStudio, which employs around 175 people, has thousands of paying customers and is profitable, according to Bajuk. Last week, the company announced what amounts to an OEM deal with Qubole that sees R Studio Server Pro and Shiny Server offerings made first-class options for Qubole data lake customers.
Before engaging with RStudio, the folks at Qubole undertook a study to gauge the popularity of R. They found that, contrary to the rumors of R’s demise, that R remains an integral part of the data science processes for its customers and prospects. In fact, Qubole concluded that organizations would rather use both R and Python together, if they had their choice.
That jibes with what RStudio has seen in the data science community. “We see both [R and Python] as powerful, both with unique strengths and options,” Bajuk says. “Both help drive data science insights, and from our purposive it’s not R or Python, it’s open source data science. That’s the focus of our company.”
RStudio has taken steps to enable customers to use both Python and R. In addition to supporting Python, the company is supporting things like the Juypter data science notebook, the Streamlit framework, and the Bokeh visualization library, Bajuk says.
“We think both are great. Python has become very popular and that’s part of the reason we have embraced it within in our products,” he says. “But at the same time, we see R being very powerful.”
You’ll get no argument from Bajuk that R and Python are good at different things. R has been around longer than Python and has a more mature group of statistical packages and libraries associated with it. At last count there were over 13,000 packages available in the Comprehensive R Archive Network (CRAN). “The deeper analytics bench is behind R,” he says.
R is the choice if you want to do some quick data analysis and generate some visualizations, Bajuk says. “R is really great at communication, really great at creating visualization, at creating applications, like with the Shiny Server,” an RStudio offering designed to make it easy to publish Web applications and documents based on R work.
R is also a great orchestration language, a strength that Bajuk attributes back to R’s roots in the S language at Bell Labs in the 1970s. “People like to use R to tie together these different things,” he says.
Python’s history as a general-purpose language has given it a larger overall following than R, Bajuk says. Python is easier to deploy, integrate, and scale than R, he wrote in a December 2019 blog post. “Python is great at data engineering,” he says, which is one aspect of the language that makes it really strong for creating and managing ETL and machine learning workflows.
“Certainly, Python has the advantage that more people overall know Python because Python is used for lots of different things, so Python has become very popular for data science,” Bajuk says. “But in general, we kind of stay out of the R vs Python world. Rather than R versus Python, we focus on R and Python.”
Everybody has their own personal preferences when it comes to which language to use, and so RStudio’s approach is to support both R and Python within its products. The company recently posted a video titled “R and Python: A Love Story,” to explain its stance on the topic.
At the end of the day, Bajuk welcomes any new interest in R, which RStudio will remain a steadfast supporter of. “I think there is some element of a resurgence recently,” he says. “Despite waves of buzz going back and forth, we don’t feel like R went anywhere, so we’re happy that the buzz turned back to R.”