Follow Datanami:
January 21, 2014

How To Boost Your Big Data Salary

Alex Woodie

The best way to boost your salary in 2014 may be to learn a new data analytics tool. According to the 2013 Data Science Salary Survey from O’Reilly, there was a strong correlation between data scientists and analysts who used the highest number of tools and those who had the highest salaries. What’s more, those who used open source tools, such as R and Hadoop, tended to bring home more bacon than those who used commercial products, such as SAS and Teradata.

O’Reilly’s salary survey found there was a positive correlation between the number of tools and salary level. The survey, which is based on data from two of O’Reilly’s Strata events in 2013, found that, on average, a respondent who reported using 10 tools had a median income of $100,000, while a respondent who used 15 or more tools had a median salary of $130,000.

The number of tools used is important, but so is the type of tool. O’Reilly grouped the tools into two clusters, including:

  • the “Hadoop” cluster of tools, which consists of open source tools like Hadoop, R, Python, Java, Hive, Pig, Mahout, the Cassandra NoSQL database, graph databases, and several scalable machine learning tools;
  • and the “SQL/Excel” tool cluster, which consists of commercial tools such as SQL, Excel, Microsoft SQL Server, Oracle’s RDBMs, DB2, Teradata, SAS, Tableau, Cognos, and SAP’s BusinessObjects.

The authors of the report then sliced and diced tool usage to come up with some interesting observations. For starters, those using open source tools had higher salaries than those using commercial tools. “For example, respondents who selected 6 of the 19 open source tools had a median salary of $130,000, while those using 5 of the 13 commercial cluster tools earned a median salary of $90,000,” the report says.

The authors, John King and Roger Magoulas, had some ideas on why this is the case. “We suspect that a scarcity of resources trained in the newer open source tools creates demand that bids up salaries compared to the more mature commercial cluster tools,” they write.

There is one caveat: while the Hadoop tool usage and salary were positively correlated, SQL/Excel tool use was only correlated only slightly (in the negative direction) with salary. The one tool that tended to buck the trend in the SQL/Excel group was Tableau Software.

“Tableau is an outlier in the correlation graph, somewhat bridging the two clusters, as Tableau correlated with R, Cloudera, and Cassandra usage,” the authors write. “Tableau is one of the few SQL/Excel tools that correlates positively with salary.”

In addition to generally making more money, those who work with open source tools tend to work with a greater number of tools, and those who work largely with commercial tools tend to stick with a smaller number of tools. In other words, data scientists who use more commercial tools than open source tools tend to use their commercial tools in isolation. And, as the survey found, they also tend to make less money.

The authors added one more caveat, which we suppose data science types are wont to do. “Sampled from attendees at two conferences, these results capture a particular category of professionals: those who are heavily involved in big data or highly motivated to become so…” they write. In other words, these event-goers aren’t your typical garden variety data analysts and data scientists, but tend to be hungry to learn new stuff.

Despite that, the authors write, “it seems very likely that knowing how to use tools such as R, Python, Hadoop frameworks, D3, and scalable machine learning tools qualifies an analyst for more highly paid positions–more so than knowing SQL, Excel, and RDB platforms,” they write. “We can also deduce that the more tools an analyst knows, the better: if you are thinking of learning a tool from the Hadoop cluster, it’s better to learn several.”

Related Items:

RDBMs: The Hot New Technology of 2014?

Big Data Spending to Hit $8M Per Organization, IDG Says

The Big Data Market By the Numbers

 

 

Datanami