Follow Datanami:
June 25, 2012

The Double-Edged Sword of Open Source

Ian Armas Foster

Unlocking big data’s potential is a profitable but costly venture. However, in may cases, only the largest data-collecting companies have the ability to fully exploit the possibilities that lie under the mounds of information.

But open source may be leading to an opening for smaller companies to reap the benefits of big data, at least in the opinion of Bertrand Diard, Talend’s co-founder and CEO.

Not surprisingly, Diard focuses on Hadoop as the most prominent and successful example of open source findings its way into the wells of big enterprise data. “Today big data is largely centered on leveraging the open source Apache Hadoop platform and the innovation coming out of the companies supporting or extending it like Cloudera, Hortonworks and MapR,” Diard writes. “This is where the center of IT innovation is now, and these emerging companies are completely disrupting large software companies such as IBM and Microsoft.” Diard recognizes the importance of co-operation in advancing the capabilities of open source big data platforms such as Hadoop.

Open source co-operation allows these companies to behave more like scientists perfecting a newly discovered law in that the current platform can be tweaked and modified for everyone’s benefit. Indeed, Diard likens the companies and vendors working on Hadoop to a community much like a scientific community. Even though all the companies are technically competing against each other as “customers will only select one vendor partner for a given deployment,” it is in everyone’s best interest to have the ability to analyze big data.

It is only natural that larger sites such as Yahoo and Facebook are contributing to the Hadoop platform, building solutions to problems they and their data helped create. While Yahoo and Facebook take an enormous benefit for themselves in analyzing data that is useful to them, they too share solutions with the open source community. After all, sharing with people means people will share with you.

The second half of Diard’s piece offers slightly contradictory points. The first is that big data, when used on open source formats, levels playing fields. “Hadoop… has democratized data, turning it into a competitive advantage no longer just reserved for the big guys. It brings big data to the masses, and that is thanks to the open source nature of Hadoop.” However, Diard goes on to say that Hadoop’s full potential is only realized with those like Facebook and Google with the resources to do so. “Organizations that expect to leverage big data not only have to understand the intricacies of foundational technologies like Hadoop, but need the infrastructure to help them make sense of the data and secure it. Without these complementary capabilities, big data will remain an IT privilege and remain out of the reach of business people and the lines of business that they represent.”

Contradictory or not, what resonates here is the larger point about open source. As it stands right now, it takes those with extensive experience and remarkable ingenuity with Hadoop along with larger structures capable of handling it to properly utilize. However, the hope is that open source work will narrow that gap more efficiently than would without open source.

Diard sees open source opening the full capabilities of big data to enterprises big and small. Those with Hadoop are working on it and sharing their work on the open source market, a process Diard hopes will advance the big data market and level the playing field for those using it.

Related Stories

How 8 Small Companies are Retooling Big Data

Partnership Targets BI Scalability

Six Super-Scale Hadoop Deployments