To Hadoop, Or Not to Hadoop? That is the Question
Once dismissed as a science project, big data analytics has become a mainstream tool for companies looking to leverage vast pools of data for competitive advantage. And Hadoop, as a cloud-based or on premise platform, has achieved “golden child” status as the end-all solution to every big data problem.
Still, as industry experts caution, while there are certainly cases where Hadoop makes a lot of sense, there are also instances where, in spite of all the hype, Hadoop is simply not the best solution for the problem. For companies considering the deployment of Hadoop for data analytics, here are some examples of “when” and “when not” to use Hadoop.
- For analyzing unstructured data – Sure Hadoop can handle large data volumes, but Hadoop’s real advantage lies in its ability to collect and analyze a broad variety of unstructured data such as sensory device data, e-mails, text documents, videos, photos, audio files, and social media data. The ability to join, aggregate and analyze multi-source data without having to structure it first allows organizations to gain deeper insights quickly. The kinds of insights that can inform new products, create a better end-user experience, and give companies an advantage in the marketplace.
- When a scalable infrastructure is needed – As the data demands of organizations increase, Hadoop’s scalable infrastructure allows servers to be added as needed in order to accommodate growing workloads—all without having to change data formats or applications. And with Hadoop in the cloud, virtual servers can be spun up or down within minutes to accommodate constantly changing workloads.
- For cost-effective analytics – Combining open-source software with commodity servers, Hadoop can be a very cost-effective solution for storing and analyzing large sets of unstructured data. Better still, Hadoop in the cloud allows companies to save even more money by contracting with a cloud vendor to forgo the costs of physical servers and warehouse space.
- Use Hadoop: For large, distributed data processing where fast performance isn’t crucial- Hadoop is designed to run batch jobs that address every file in the database. And this process takes time. That makes Hadoop ideal for such tasks as running end-of-day reports to review daily transactions. It’s also very suitable for scanning historical data and performing analytics where a short time-to-insight isn’t critical.
Don’t use Hadoop:
- For collecting and analyzing structured data. Being that Hadoop is designed for processing vast stores of accumulated data, using Hadoop for storing and analyzing data that trickles in at a steady and predictable rate over time would be overkill.
- For handling workloads that are constant and predictable. Companies that adopt Hadoop without thinking it through can end up paying for storage space and computing power that they will never use.
- If costs are a concern. Unless the realized benefits outweigh the costs of deploying Hadoop, companies should turn to more cost-effective tools to meet their data storage and analytics needs.
- For time-sensitive data analysis. Despite its abilities to handle large unstructured and semi-structured data sets, when it comes to analyzing smaller data sets quickly, Hadoop is not a great fit. This is especially true in online environments that demand fast performance. Organizations that depend upon product recommendation engines, for example, should rely upon faster analytics tools equipped to process small amounts of information in real or near real-time.
That being said, a number of tools and technologies, including cloud-based Hadoop platforms, are now available to enable real-time analysis of Hadoop data. Like all tools, Hadoop has limitations, particularly when attempts are made to use it for purposes other than those for which it was designed. Although Hadoop is not the “Swiss Army Knife” solution for solving data problems that its reputation suggests, organizations that exercise due diligence in deploying Hadoop will discover the value and benefits of using the right tool for the job.
About the author: Gil Allouche is the vice president of marketing at Hadoop as a service provider Qubole. Gil began his marketing career as a product strategist at SAP while earning his MBA at Babson College and is a former software engineer.