Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Leverage Big Data'14

February 02, 2013

The Week in Big Data Research


This week’s research brief brings news of cutting-edge work at global centers as teams try to keep refining Hadoop’s security, reliability and functionality—both on premise and in cloud environments. We also take a look at some interesting uses of MapReduce with projects that address image processing requirements—not to mention data integrity.

In case you missed it, here is last week's edition of the Week in Research... Without further delay, let’s launch in with our top item this week:

A Leap Toward Hadoop Fault Tolerance

A team of researchers from Osaka University have proposed an approach to creating a more fault-tolerant Hadoop through the creation of an auto-recovering Job Tracker.

The team notes that while Hadoop is able to provide a decent level of reliability, the job scheduler, called the JobTracker, remains the single point of failure for many systems. Specifically, if the JobTracker fails to stop during a job execution, the job is cancelled immediately and all of the intermediate results are lost.

To counter this, the team points to its auto-recovery system that allows for a failsafe stopper without adding any additional hardware overhead. This approach is based on a recovery mechanism that is based on a checkpoint method wherein a snapshot of the JobTracker is stored on a distributed file system at regular intervals. When the system detects the fail-stop by using timeout, it can automatically recover the JobTracker via the snapshot.

According to the research team, the key feature here is this “transparent recovery such that a job execution continues during a temporary fail-stop of the JobTracker and completes itself with a little rollback. The system achieves fault tolerance for the JobTracker with overheads less than 43% of the total execution time.”

Next -- A Reliable BigTable for the Public Cloud >

 

A Reliable BigTable for the Public Cloud

A team from North Carolina State University is taking aim at data integreity for distributed storage system for big data, BigTable, in the context of public cloud environments.

The researchers note that while rolling out BigTable on public clouds presents an attractive option from a cost savings point of view, especially for small businesses or smaller research groups with big data problems, there are several issues these users are faced with when considering public clouds, either because of security concerns or worries over data integrity with the storage system running in the cloud.

What they’ve proposed to counter these concerns is called “iBigTable” which is an “enhancement of BigTable that provides scalable data integrity assurance.” They have considered the practicality of different authenticated data structures around BigTable and have designed a set of security protocols to “efficiently and flexibly verify the integrity of data returned by BigTable so that existing applications over BigTable can interact with iBigTable seamlessly with minimum or no change of code.”

To prove their model the team implemented a prototype of iBigTable based on HBase, which itself as an open source form of BigTable. They were able to demonstrate how their iBigTable can offer “reasonable performance overhead while providing integrity assurance.”

Next -- On a More Robust, Secure Cloud-Based Hadoop >

 

On a More Robust, Secure Cloud-Based Hadoop

A research team from National Sun Yat-sen University in Taiwain are also tackling security issues for a cloud-based Hadoop framework, but are also turning their eyes toward the important matter of overall application performance.

The researchers note that cloud computing platforms offer a convenient solution for addressing challenges of processing large-scale data in both academia and industry, beyond what could be achieved with traditional on-site clusters. They say, however, that while there are a great number of on-line cloud services that are attractive environments, the security issue is getting more and more significant for cloud users.

According to the group, whereas Hadoop-based cloud platforms are  currently a well-known service framework, they have been focusing their investigation on the mechanisms of authentication and encryption of Hadoop.

The team has constructed what they call a secure Hadoop platform with small deployment cost, robust attacking prevention, and less performance degradation. To prove their model they have run a number of simulations to evaluate the performance under different parametric settings and cryptographic algorithms.

The researchers note that the simulation results reveal the feasibility of security mechanisms, and find that the more important thing to construct cloud platforms with appropriate security mechanisms is to consider the application requirements, which could be a better trade-off between security and user requirement.

Next -- Pegging the Outliers in Big Data >

 

Pegging the Outliers in Big Data

According to a research team from the Centre for AI and Robotics in Bangalore, India, the rapid growth in the field of data mining has lead to the development of various methods for outlier detection.

Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. To address this, the team has proposed a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers.

 In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. They say the proposed algorithm can perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data.

The team says that unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm was demonstrated through experiments on various public domain categorical data sets.

Next -- MapReduce Paves the Way for CBIR >

 

MapReduce Paves the Way for CBIR

Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries.

As a research team from the School of Science, Information Technology and Engineering at theUniversity of Ballarat, Australia has suggested, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately.

The researchers say that with this in mind, it has become increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the offer up a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment.

The team has adopted natural language queries that use a fuzzy approach to classify the color of images based on their content and apply Map and Reduce functions that can operate in cloud clusters for arriving at accurate results in real-time. Preliminary experimental results for classifying and retrieving images from large data sets were quite convincing to carry out further experimental evaluations.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

ISC'14

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event