Follow Datanami:
March 29, 2013

This Week in Big Data Research

Datanami Staff

This week’s big data research run down has thought-leadership coming from all corners of the globe, with researchers looking for solutions for challenges in everything from big data storage, to multi-tenant, heterogeneous operating systems, and new translators to handle Map-Reduce jobs. Buckle up and secure your Google Glass, here’s your weekly ride in big data research.

Is Synthetic DNA the Next Generation of Big Data Storage

Synthetic DNA may hold the solution for the growing data challenges of a big data future, say researchers at the Cork Institute of Technology in Ireland.

“With world wide data predicted to exceed 40 trillion gigabytes by 2020, big data storage is a very real and escalating problem,” write the researchers Aisling O’ Driscoll and Roy D. Sleator. The researchers say that the solutions do not lie in thinking larger, but rather by thinking smaller.

For this, they turn to the recently published research coming out of Harvard and EMBL-EBI, that indicates that DNA is a high-capacity storage medium with a theoretical storage potential of 455 Exabytes per gram ssDNA. The research theorizes that all of the worlds projected 40 ZB of data could be stored in ~90 g of DNA.

The team says that there have already been attempts to use DNA as a workable canvas for archival purposes, and that a recently constructed bacterial cell with a completely synthetic genome has been created that have been used to store such things as a web address, names of researchers, and quotes from notable luminaries.

Many problems still exist, but the researchers say that despite the economic impracticality of DNA storage in 2013, this surprisingly simple idea has the potential to reshape the global face of data storage in the not too distant future.

Next – A Real Time Big Data Processing Framework for Health Sensors —>

A Real Time Big Data Processing Framework for Health Sensors

The increase in use of low cost sensors for health monitoring is bringing the rise of Medical Cyber Physical Systems (MCPS), say researchers from Konkuk University in South Korea, enabling systems for the remote monitoring of patients in real-time.

However, say the researchers, this rise in MCPS creates a glut in efficiency in existing platforms as they choke on the data, failing to process the massive amount of vital info in real time. Latency in health applications, of course, will not do. To advance this field, the researchers say that a new look at computing frameworks and infrastructures is a requirement. Enter the Bigdata processing framework.

Incorporating Hadoop, and designed specifically for MCPS, the researchers say that Bigdata gives a new paradigm for integrating various cyber and physical aspects and thus provides a complete solution for handing various clinical use cases where the sensors are the data paintbrush.

Bigdata framework techniques include “stream data pre-processing” as well as an accelerator modules for improved communication performance, and an awareness module to provide semantic output that can be acted on for better health outcomes.

Next – A Two-Headed Operating System for High Performance Systems —> 

A Two-Headed Operating System for High Performance Systems

With advancements happening in HPC, there is a strong incentive to facilitate the large-scale distributed systems used in HPC for new scalable workloads say researchers involved with IBM Research, and The Karlsruhe Institute of Technology.

Statistical analysis of huge amounts of big data is increasingly perceived as the new great challenge for computing, and new ways of interacting with and processing these large amounts of data are needed. In order to better accommodate this, the researchers propose FusedOS – the fusing of two OS personalities in a way that allows for heterogeneous cores that can process and interact with each other.

Already early experiments have been done with FusedOS on the IBM Blue Gene/Q. The researchers fused an existing specialized light-weight kernel used by Blue Gene with general purpose Linux, with both cores having different, heterogeneous roles.

The researchers report that while system call latency in the prototype turned out to be very high, the performance of HPC applications turned out to be surprisingly good.

 Next – Real Time Data Driven Decision-Making in E&P —>

Real Time Data Driven Decision-Making in E&P

New technologies have paved the way for real time computing paradigms in E&P says IBM Software Group researcher, Michael R. Brulé in a conference paper for the Society of Petroleum Engineers.

Brulé suggests that combining stream computing for analyzing high-frequency data for real-time complex-event-processing with Hadoop MapReduce and other NoSQL approaches can now be used to support the E&P industry’s legacy physics-based methods in modeling and simulation that have been built up over the years. Combining these computing paradigms enables “Real-Time Adaptive Analytics,” says Brulé.

Achieving this combination of computing modes would provide the industry with a low-latency “Real-Time Data Flow Architecture,” which Brulé says would enable data-driven decision making during operational events at the speed of business.

Next – Get S2MART: Smart SQL to Map-Reduce Translators for Bigger Data —>

Get S2MART: Smart SQL to MapReduce Translators for Bigger Data

Researchers at Anna University in Chennai India believe that conventional SQL-based data processing (think Hive and Pig) have limited scalability.

The rapid increase in the size of data in large systems, say the researchers, and the rise of big data make it necessary to build more efficient and flexible SQL to MapReduce translators that can handle Tera and Peta bytes of data easy to access and retrieve.

The team observes that conventional SQL-based data processing has limited scalability, and propose a Smart SQL to Map-Reduce Translator (S2MART) to more efficiently transform the SQL queries into Map-Reduce jobs.

The project seeks to employ a spiral modeled data base aiming to reduce data and network transfer costs and minimize redundant operations.

 

Datanami