August 26, 2014

Poll: SAS Use Surges for Data Mining

George Leopold
datamining_125.png

A recent poll querying data scientists on which programming and statistics languages they used in 2014 for analytics, data mining and data science found that four main languages dominated.

The data mining community web site KDnuggets reported earlier this month that respondents identified R, Python, SAS and SQL (in that order) as a preferred programming language. Fully 91 percent of respondents used one of the four languages. The R programming language led the way, cited by 49 percent of respondents to the community poll. It was followed by SAS (36.4 percent, Python (35 percent) and SQL (30.6 percent).

Respondents also identified combinations of languages they preferred, including R and SQL (22 percent).

The survey noted a large increase in SAS user participation in 2014, “perhaps partly driven by growth and change in KDnuggets readers composition, and likely also by increased visibility of this poll among SAS users.” The site also reported that SAS users had a high percentage of “lone” votes. For example, 58 percent said they used only SAS in 2014 compared to 26 percent last year.

The fraction of “lone” votes in 2014 was 20.5 percent for R, 14 percent for Python and only 4.5 percent for SQL, the poll found.

While programming language use appears to be consolidating around those four languages, the survey found declining use of other languages for data mining tasks. They included Java, Unix shell, MATLAB, C/C++, Perl, Octave Ruby, Lisp and F#.

F#, or F Sharp, showed the greatest decline in usage among programmers working on data mining, with no respondents saying they used it in 2014. Meanwhile, C/C++ usage declined 60 percent over last year while MATLAB dropped 50 percent, the poll found.

Along with SAS (up 76 percent), the biggest gainers in the 2014 poll of programmers were Julia (up 316 percent) and Scala (74 percent increase).

The survey also tracked the overlap in programming language preference, revealing that the R and SQL languages were being used by 22 percent of respondents while 20 percent said they were using both R and Python. Python and SQL followed at 13 percent while 10 percent of respondents said they had used R, Python and SQL over the past year.

The KDnuggets poll attracted 719 respondents in 2014, up slightly from the previous year. Of the 713 replies in 2013, 60.9 percent said they used the R language. The majority of respondents (51.6 percent work in North America while 26.7 percent worked for European enterprises. Asia accounted for 13.3 percent of respondents this year while Latin America, Africa and the Middle East along with Australia and New Zealand accounted for the remaining respondents.

One respondent question the inclusion of SQL in the polling, arguing it was a database language that should not be considered in the same category as R or Python. The survey author responded that SQL was included in the polling as part of a data science workflow to connect databases. Besides, he added, SQL remains “very popular, as the [poll] results show.”

Recent items:

Data Mining for the Masses

Data Mining for Human Cognition