Python Eats Into R as SAS Dominance Fades
A new survey of data science tools shows that Python usage is quickly gaining steam among advance analytic professionals, at the expense of both R and SAS.
Last month the executive recruiting firm Burtch Works published the latest results of an ongoing “flash survey” of tooling preferences of its network of analytics professionals. It was the third straight year that the company has conducted the study, and the results show a definite trend towards open source alternatives and away from the longstanding suite of analytic tools from SAS, which are proprietary.
According to the results of the 2016 survey, R is the preferred tool for 42% of analytics professionals, followed by SAS at 39% and Python at 20%. While Python’s placing may at first appear to relegate the language to Bronze Medal status, it’s the delta here that really matters.
Here’s the interesting bit: While the first two years of Burtch Works’ survey was focused on the SAS vs. R war, so many analytics professionals chose to write in Python that the company was forced to include the language as a third choice. “Last year, we received enough write-in requests to include Python in our survey that we decided to evolve with the times!” the company writes in its blog announcing the results. (Bernie supporters, take note).
There were a few other interesting tidbits from the survey. For instance, among those professionals who identify as a “data scientists,” Python is the tool of choice for 53% of survey respondents, while SAS garnered a tiny 3% share. Among those who identify themselves as “predictive analytics” professionals, SAS and R were in a virtual tie (43% vs. 41%, respectively), while only 16% prefer Python.
There’s a correlation between the education level of the survey respondents and their tool preference, with R usage increasing with the amount of postgraduate education. Interestingly, SAS usage increases with the number of years in the analytics saddle. This makes sense when you consider that analysts who forwent a more advanced degree were more heavily exposed to SAS, which been so dominant in the private sector for so long, while those who stayed in school were more heavily exposed to open source alternatives, like R.
SAS is used more heavily in industries like financial services, healthcare, and retail, while R is favored in the high tech, telecom, and consulting sectors. There’s also an open source (read: R and Python) bias on the tech-heavy West Coast and intermountain West regions, while SAS dominates with the more entrenched industries of the Midwest, the Southeast, and the Northeast.
While all open source tools continues to gain steam against proprietary setups among data science pros, the big story here is the emergence of Python as a major force on the analytics stage.
Python was first developed in the early 1990s, with roots in C. It’s widely considered to be easier to learn than R, and its status as a general purpose language makes it a relatively simple matter to implement statistical functions into existing applications.
Python isn’t new, but it’s just now appearing to gain steam in the analytics community, at the expense of R and proprietary packages like SAS, IBM‘s SPSS, and Mathworks‘ Matlab. The gradual learning curve and availability of analytic application development environments like Jupyter and the iPython Notebook are helping to spur adoption.
On the other hand, R also is widely used in the data science community. It’s widely considered to be superior for pure statistical analysis, partially with the availability of packages such as the Comprehensive R Archive Network (CRAN), as well as notebooks like R Markdown and R tools that Microsoft acquired from Revolution Analytics. However, the steep learning curve and lack of applicability outside the statistical community are seen as limiting factors for R.
One of the biggest supporters of Python is Travis Oliphant, the CEO and co-founder of Continuum Analytics, which develops the Anaconda suite of Python-based tools for data science and advanced analytics. Oliphant was the primary developer of NumPy, a package of statistical tools that expands Python, and contributed to the open source Anaconda tools that Continuum develops.
“The nice thing about Python is it’s very flexible and you can create interfaces that are closer to what you’re used to,” Oliphant told Datanami in a recent interview. “SAS does a lot of good things. The challenge is that it’s expensive and proprietary.”
Another advantage of Python is its ease of deployment on the Web. Oliphant cites Python’s integration with Web application frameworks like Django and cloud platforms like Heroku as a major advantage over “legacy” systems such as SAS.
Python’s surge is also visible in the various rankings of programming languages. In the last two-and-a-half years, Python’s popularity has just about doubled according to the TIOBE Index, which shows Python as currently the fifth most popular language (R is 17th). Redmonk’s bi-annual report on programming languages shows Python among the top tier of languages too.
There are a lot of variables that impact what languages and tool an application developer chooses to use. In the world of data science, it’s apparent that many of these factors are increasingly driving developers to Python.