October 30, 2013

Splunk Pumps Up Big Data with Hunk

Alex Woodie

Hadoop users who are looking for another way to explore and visualize their big data sets may want to check out Hunk, the new product built on MapReduce that Splunk shipped today. Hunk lets users apply the same type of data visualization and analytic processing that Splunk Enterprise users are accustomed to, but to do so against any data residing within Hadoop.

If one were to follow the buzz emanating from the area of machine-generated data, it would likely trace back to Splunk. The San Francisco software company is doing a bang up job of transitioning from the tired and boring world of IT log consolidation and management into the bright and shiny new world called The Internet of Things.

Splunk’s big data story has unfolded in parallel to the rise of Hadoop. Whereas many companies have pumped all sorts of semi-structured and un-structured data into their Hadoop repositories with the strategic idea that it will be useful at some point in the future, Spunk deployments typically follow a more tactical approach.

Many Splunk customers, such as Domino’s Pizza, start out collecting IT-related information from applications, Web servers, databases, networks, telecom equipment, and sensors, with the goal of driving efficiency into IT process. After they become familiar with Splunk and grow to like its dashboards, report generation, and real-time alerting capabilities, they start applying Splunk to other types of data. In Domino’s case, it expanded its use of Splunk to analyze food orders coming in over the Web.

Splunk Enterprise runs on standard Windows and Linux-based servers, and stores data in “Splunk Buckets” running on local disk or SANs. If a customer is storing data in a standard relational database, it can be brought over with connectors. No fancy Hadoop or exotic NoSQL data stores here.

The company started bringing some Hadoop-resident data into the “Splunkesphere” in 2012 with the launch of Splunk Hadoop Connect. The problem with that approach is that some data sets in Hadoop are simply too big to move into the Splunk environment. (In many cases, that’s why the data is in Hadoop in the first place).

So Splunk made Hunk specifically to tackle this problem, and to enable users to extend their investment in Splunk Enterprise and apply it to Hadoop-resident data. It’s another take on the in-database analytic approach that has become popular recently.

Hunk runs atop any standard Hadoop distribution, and effectively delivers the Splunk Enterprise stack for Hadoop. This enables users to build and consume the same types of analytical and data visualization dashboards and reports for Hadoop-bound data as Splunk Enterprise could for machine-generated data stored in a standard NFS or CIFS file system.

There are a couple of interesting technical differentiators in Hunk that are worth pointing out. For starters, the company touts what it calls its Splunk Virtual Index, which it says “decouples the data storage tier from the data access and analytics tiers.” The net result of this is that it speeds up search times in Hadoop. It also allows users to search Splunk Enterprise and Hadoop data stores with a single query.

Then there’s “schema on the fly,” another technology under Hunk’s covers. Schema on the fly applies structure to data the moment a query is run, according to Splunk. This allows users to explore the data sets as they see fit, without having to think about the questions they would like to ask of the data beforehand, as they would do if using SQL or Apache Hive against their Hadoop data. The software will automatically add structure and identify things in the data that would most likely interest the user, such as keywords, patterns, and top values.

Splunk says results start coming back immediately after a user submits a query in Hunk, while the MapReduce job continues to run in the background. However, don’t confuse this for Storm or another streaming Hadoop technology. Events cannot be streamed into Hunk for real-time analysis, as they can in Splunk Enterprise. This software is still part of a batch-oriented workflow. There is no real-time searching of Hadoop data in Hunk, although a preview of this available. Time-series data also isn’t supported, and data models and report acceleration features are not available in Hunk.

Hunk applications are powered by the same Search Processing Language that powered Splunk Enterprise applications. But users don’t have to worry about learning that language to use Hunk, since all that’s required to build Hunk apps (charts, dashboards, visuals) is the knowledge of the Pivot user interface. More advanced developers will use the Web framework component of Hunk, which includes an SDK that lets developers work in their favorite languages (C#, Java, PHP, etc.). Developers can also bring in pre-made UI components from their favorite JavaScript library or other libraries.

Hunk supports all standard Hadoop distros, including Hadoop version 1.2 and Hadoop version 2 offerings from Amazon EMR, Cloudera, Hortonworks, IBM BigInsights, MapR Technologies, and Pivotal. Pricing starts at $2,500 per Hadoop node.

Related Items:

Splunking Up a Machine Data Storm

Splunk Announces Beta Version of Hunk: Splunk Analytics for Hadoop

Hadoop Version 2: One Step Closer to the Big Data Goal

Applications: Data Mining, Visualization

Technologies: Middleware

Vendors: Startups and More...

Tags: Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Splunk Pumps Up Big Data with Hunk

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Splunk Pumps Up Big Data with Hunk

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link