July 31, 2017

Hate Hadoop? Then You’re Doing It Wrong

Andrew Brust

(mw2st/Shutterstock)

There’s been a lot of Monday morning quarterbacking saying Hadoop never made sense in the first place and that it has met its demise. Comments from “Hadoop is slow for small, ad hoc jobs” to “Hadoop is dead and Spark is the victor” are now commonplace.

The frenzy around Hadoop’s deficiencies is almost as fierce as the initial hype around how powerful and disruptive it was. However, while it’s understandable that people have come up against difficulties deploying Hadoop, that doesn’t mean the negative chatter is true.

Out of Sight

TCP/IP powers the Internet, your email, your apps and more. But chances are you don’t hear about it. When you request a ride sharing service, stream media or surf the Internet, you’re benefitting from its power.

For the most part, we all rely on TCP/IP on a daily basis, yet have no interest or need to configure it. We don’t spend time going on our Macs typing commands like ifconfig to see how your WiFi adapter is configured to get online.

The complexity of the TCP/IP stack is mostly invisible to us now, as Hadoop’s complexity will eventually be

In the 1990s, TCP/IP used to be sold as a product, and adoption was somewhat tepid. Eventually, TCP/IP got built into operating systems and, perhaps paradoxically, that’s when it conquered all. It became a universal standard, and at the same time it disappeared from plain sight.

Hadoop Is Infrastructure

Similarly, Hadoop is the TCP/IP of the Big Data world. It’s the infrastructure that delivers huge benefits. But that benefit is greatly diluted when the infrastructure is exposed. Hadoop has been marketed like a Web browser, but it’s much more like TCP/IP.

If you’re working with Hadoop directly, you’re doing it wrong. If you’re typing “Hadoop” and a bunch of parameters at the command line, you’ve got it all backwards. Do you want to configure and run everything yourself, or do you just want to work with your data and let analytics software handle Hadoop on the back end?

Most people would choose the latter, but the Big Data industry often directed customers to the former. The industry did it with Hadoop before, and they’re doing it with Spark and numerous machine-learning tools now.

It’s a case of the technologist tail wagging the business user dog, and that never ends well.

Dev Tools Aren’t Biz Tools

It’s not that the industry has been totally oblivious to this problem. Some vendors have tried to up their tooling game and smooth out Hadoop’s rough edges. Open source projects with names like Hue, Jupyter, Zeppelin and Ambari have cropped up, aiming to get Hadoop practitioners off the command line.

Getting Hadoop users off the command line doesn’t necessarily mean they’re more productive

But therein lies the problem. We need tools for business users, not Hadoop practitioners. Hue is great for running and tracking Hadoop execution jobs or for writing queries in SQL or other languages. Jupyter and Zeppelin are great for writing and running code against Spark, in data science-friendly languages like R and Python, and even rendering the data visualizations that code produces.

The problem is these tools don’t get rid of command line tasks; they just make people more efficient at doing them. Getting people physically off the command line may be helpful, but having them do the same stuff, even if it’s easier, doesn’t really change the equation.

There’s a balancing act here. To do Big Data analytics right, you shouldn’t have to use the engine – Hadoop in this case – directly, but you still want its full power. To make that happen, you need an analytics tool that tames the technology, without dismissing it or shooing it away.

Find that middle ground and you’ll be on the right track.

The Path Forward for Hadoop

Hadoop isn’t dead, nor is it the problem. Hadoop is an extremely powerful, critical technology. But it’s also infrastructure. It never should have been the poster child for Big Data. Hadoop (and Spark, for that matter) is technology that should be embedded in other technologies and products. That way those technologies can, in turn, leverage their power, without exposing their complexity.

Hadoop’s not anymore dead than TCP/IP. The problem is how people have used it, not what it does. If you want to do big data analytics right, then exploit its power behind the scenes, where Hadoop belongs. If you do it that way, Hadoop will be resurrected, not by magic, but by common sense.

But you probably won’t even notice. And if that’s the case, then you’re doing it right.

About the author: Andrew Brust is Senior Director, Market Strategy and Intelligence at Datameer, liaising between the Marketing, Product and Product Management teams, and the big data analytics community. And writes a blog for ZDNet called “Big on Data” (zdnet.com/blog/big-data); is an advisor to NYTECH, the New York Technology Council; serves as Microsoft Regional Director and MVP; and writes the Redmond Review column for VisualStudioMagazine.com.

Related Items:

Hadoop Has Failed Us, Tech Experts Say

Hadoop at Strata: Not Exactly ‘Failure,’ But It Is Complicated

Charting a Course Out of the Big Data Doldrums

Applications: Enterprise Analytics

Technologies: Frameworks, Middleware

Vendors: Datameer, Microsoft, ZDNet

Tags: Ambari, Andrew Brust, business intelligence, Hadoop, open source

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Hate Hadoop? Then You’re Doing It Wrong

Out of Sight

Hadoop Is Infrastructure

Dev Tools Aren’t Biz Tools

The Path Forward for Hadoop

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Hate Hadoop? Then You’re Doing It Wrong

Out of Sight

Hadoop Is Infrastructure

Dev Tools Aren’t Biz Tools

The Path Forward for Hadoop

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link