Follow Datanami:
October 23, 2014

Why Pay for Analytics When Open Source Is ‘Free?’

David Pope

The free analytics question comes up over and over again, especially as it pertains to open source analytic offerings. I don’t think a day goes by when someone at a company doesn’t ask, “Why should I pay for analytics when I can use (fill in the blank) open source statistics/analytics?”  You will find many lengthy discussions all over the Web on just this topic.  Often the discussion is much more impassioned than it needs to be. I will address the open source issue from a somewhat different angle.

We all know open source has its place. While some industries and some companies have policies in place to limit or restrict the use of open source in production or critical applications, research has shown that many companies are using both commercial and open source analytics in tandem. One typical scenario is a company using R for experimental model development and commercial analytics for operational model deployment.

What operating system do you use? Windows, UNIX, Linux?  If you use Linux in a typical business setting the next question is: Do you use SUSE or Redhat? It’s highly unlikely you are using a free distribution downloaded from the Web. You are more likely paying for a distribution of your “free” open source Linux or operating system.

Okay, now what database or database platforms are you using? List your favorite: Oracle, Microsoft SQL Server, Teradata, Pivotal, IBM DB2, etc. Once again, you are paying for databases when there are free, downloadable open source databases that you could be using.

Why are companies paying for DBMS and operating systems when free options have long been available? Because the supported versions offer increased value and reduced risk by providing support, documentation, training, and services that allow IT departments to meet business requirements and needs. The same hold true for commercially-developed analytics software.

The “new” open source big data platform Apache Hadoop is very attractive today in the big data hadoop elephantarena because it provides a low cost option for storing large amounts of data for longer period of times than traditional storage systems that are used to keep business operations running on a day to day basis. But even with Hadoop, two leading Hadoop distributors are prominent commercial software vendors Cloudera and Hortonworks.  Much like Redhat or SUSE, enterprises are investing in the commercially supported version of Hadoop in order to reduce risk, and get support, documentation and additional functions layered on top of the Hadoop Distributed File System (HDFS).

A primary reason companies are working with companies like Cloudera or Hortonworks is the additional functionality they are adding around workload management associated with jobs that process the data contained in Hadoop as well as developing new layers or tools to make it easier to load, unload, and keep track of what data is stored in Hadoop. These are the “extras” companies are willing pay for in order to make Hadoop more enterprise ready.

You can use Hadoop to store data at a lower cost. However, it is very uncommon for the data you store in Hadoop to be in the necessary format to allow efficient analytic processing to occur on that data. Do businesses keep their doors open just because of how efficiently they store data?  No, unless they are data storage providers. They stay in business because of forward-looking insights they get from data and how quickly they are able to get it. That is exactly what analytics provides.

Once data is stored in a commercial Hadoop distribution or other system, the next step is better data preparation for analysis, and analytic software. The analytics must work in cooperation with not only a Hadoop or big data platform, but also with existing operating systems. These analytics must work with data from, with and in Hadoop.

I believe that mathematical and statistical algorithms are the true “open source” or “free” equivalent, and  any company that develops software that implements analytics, whether their own versions or companies that leverage open source analytics and sell services, are in essence doing the same thing: They are being paid for the value that analytics brings to their customers.

And the value of analytics goes way beyond just the analytics engine. Understanding how to implement the necessary changes to processes and the skills your people will need in order to properly prepare data for analytics and then to operationalize the results are arguably much more important from a business value perspective than just the analytics standing alone. The bottom line is that whatever software works for your organization is what you should use.


About the author: David Pope is the technical team lead for SAS Institute‘s oil, gas, and utility business in the U.S. He has more than 23 years of business experience in advanced analytics working in R&D, IT, marketing, and sales. He developed expertise in big data analytics and enterprise architecture across several industries including finance, communication, healthcare, government, and education prior to focusing on the energy industry. He graduated Magne Cum Laude from North Carolina State University with a BS in Industrial Engineering and a Certificate of Computer Programming. He has presented at SAS Global Forum, IBM’s Information on Demand, and EMC World. In addition, he holds 5 U.S. patents and blogs frequently for his company.


Related Items:

How Open Source is Failing R for Big Data

A Revolution for Open Source Big Data

Poll: SAS Use Surges for Data Mining