April 16, 2013

IBM Tackling Hadoop with Machine Data Accelerator

Ian Armas Foster

IBM has been fairly visible on the big data front over the last couple of years, putting their analytics to the test in several use cases, including healthcare, state governments, and even the US Open tennis tournament.

Their next challenge is taking on Hadoop and making it more accessible for end users through their Machine Data Accelerator. IBM BigInsights Technical Sales Lead Dirk de Roos discussed how the Accelerator seeks to alleviate bottlenecks in machine log data.

“The Machine Data Accelerator is a set of building blocks to enable people to do machine data analysis or log analysis using our big insights offering,” said de Roos in providing an overview of how the Accelerator relies on IBM’s analytics to tackle massive datasets from machine logs.

For de Roos, big data problems essentially come down to finding patterns in datasets, provided those datasets are about a million to a trillion times bigger than what the end user is used to. “Log data,” de Roos said, “machine data, sensor data, that data tends to accumulate at an incredibly rapid rate, especially nowadays when we have so many sensors and systems that generate these system logs.”

Creating those large datasets are machines and sensors, which today are increasing exponentially in both the amount of data they can generate individually and the total volume of sensors and machines. According to de Roos, Hadoop was built for processing and analyzing said log data. However, there still exists a gulf between Hadoop’s capacity and people’s capacity to work with it. “Hadoop, even in its original incarnation, it was designed for large scale analysis of log data. But even though Hadoop was designed for that, it wasn’t built with out of the box tools.”

To understand how IBM’s technology handles these issues, de Roos first delved into what specifically makes finding statistical patterns in big data difficult. A significant problem, according to de Roos, lies in the diversity of data. Something as simple as timestamp format can throw analytics engines through a loop.

“It’s very important to normalize the data in these log files.” Lack of normalization often leads to errors in MapReduce queries, as the standard Hadoop query function pick up more noise than desirable when there exists a lack of clarity in the data.

IBM’s notion is to build what they call ‘sessions’ of data, which provide more standardization or at least weed out the unclear data. After all, people who are looking at big data are generally looking for patterns and statistical trends and de Roos hopes IBM can help accomplish that.

“What is unique is the ability that we have in the machine data accelerator to build sessions out of fairly big volumes of log data and then do statistical analysis on those sessions. Again it’s unique, it’s interesting from the perspective of people who are looking for patterns and are looking for the needle in the haystack.”

Blocking the noise from the signal has been at the heart of the whole IBM analytics arm, and now they are seeking to apply such techniques to Hadoop.

Related Articles

Big Data on the Range in OK

Preventing Brain Injuries with Predictive Analytics

ASTRON, IBM to Help Researchers Listen to SKA

Applications: Data Mining, Enterprise Analytics

Technologies: Systems

Vendors: IBM

Tags: BigInsights, Hadoop, IBM, Machine Data Accelerator

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

IBM Tackling Hadoop with Machine Data Accelerator

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

IBM Tackling Hadoop with Machine Data Accelerator

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link