September 6, 2013

Cloudera Search 1.0: Like Googling Hadoop

Alex Woodie

Organizations using Cloudera’s Hadoop distribution have a new search option that promises to make it much easier to find information in Hadoop. In fact, Cloudera claims that its new Cloudera Search 1.0 offering is so easy that a Cloudera vice president compared it to using Google.

That comparison is funny because, of course, the Hadoop File System itself originated from the Google labs. That it took Cloudera to come up with a way to apply the “just Google it” metaphor to Google’s big data creation is a tad ironic.

But the fact remains that Hadoop distributions, such as Cloudera’s, can be quite complex beasts that don’t do “easy” very well and don’t offer “simple” interfaces, such as search functions. According to Cloudera, it’s a laborious, time consuming, and hardware-intensive task to set up a basic search engine to hit its open source CDH or commercial Cloudera Standard and Cloudera Enterprise implementations.

That all changes with Cloudera Search 1.0 and the accompanying add-on RTS (Real-time Search) subscription offering, which are based on the open source Apache Solr search engine. The company rolled out a beta of the software earlier this year and is now supporting as a generally available (GA) product.

According to Cloudera, its new search offering enables people to perform natural language keyword searches against HDFS and Apache HBase without advanced programming knowledge or extensive training. The company says this approach lets users explore their data and discover its “shape” quickly and easily during the data modeling stage.

Cloudera says this search capability is quite valuable to Hadoop users now, considering all the different types of data they’re socking away in the HDFS. But it will become even more useful in the future, as organizations continue to load their Hadoop clusters with data that, in a previous era, would have been destined for a traditional relational data store.

“We’ve taken what was once a relatively complicated and involved freestanding system, requiring its own hardware and operational model, and turned it into a feature of a larger, more ubiquitous open source platform, CDH,” states Cloudera vice president of products, Charles Zedlewski. “With Cloudera Search, Hadoop deployments can now be explored with the same ease of use and speed as a simple Google search engine query, empowering our customers to achieve rapid insights from a fully integrated platform.”

On the front end, the search metaphor is quite easy to grasp: just Google it, after all. This is done via Cloudera Hue, the company’s Web-based UI. On the backend, Cloudera Search employs several components, including two types of indexers. This includes batch indexing of Hadoop and HBase “that is comparable to MapReduce”; and (near) real-time indexing of events via the Apache Flume and the Lily HBase Indexer. The software also features field extraction of Hadoop-optimized file formats, such as Apache Avro. Management of the software is handled with Cloudera Manager.

Cloudera Search is based on the Apache Solr project, but it bolsters it by introducing a new Solr indexer sink, which Cloudera says can be used within arbitrary Flume architectures to offer near real-time document processing and indexing.

All Cloudera Search components reside on the same cluster as other engines in Cloudera’s distro, including batch processes like MapReduce, Hive, or Pig; interactive SQL processes like Impala, and machine learning processes like Mahout. The advantage of this approach is that everything is consolidated, thereby reducing the maintenance overhead and simplifying operations.

One early user of Cloudera Search is the agriculture giant Monsanto. “Incorporating Search into our existing Cloudera environment allows us to index images of plants at various stages in their lifecycles in real-time, which accelerates product development,” said Monsanto’s R&D big data engineer Jeff Melching in a Cloudera Search brochure. “This ultimately supports our goal of helping farmers produce enough food to sustain our growing global population while conserving natural resources.”

Related Articles

Twitter Conjures Up a Hadoop-Storm Hybrid, Ponders IPO

SoundCloud Liberates Data with Hadoop, Pentaho

Data Driving the Exit Into Hadoop

Vendors: Cloudera

Tags: big data, cloudera, cloudera search, google, Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Cloudera Search 1.0: Like Googling Hadoop

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Cloudera Search 1.0: Like Googling Hadoop

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link