Leverage Big Data
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Webinar Powering Research with Knowledge Discovery & Data Mining

January 19, 2013

The Week in Big Data Research


This week's big data research and development stories cover a wide area, as researchers from Europe, China, and the United States figure out how to use big data to solve biomedical problems on this planet along with finding and storing data on other planets. Visualization and archiving big data in large databases also got its share of attention this week.

Without further ado, here begins 2013's first Week in Big Data Research

Big Planetary Census Data

Fifteen years ago, the salient features of the known extrasolar planets could be written down on an index card. At present the catalog of extrasolar planets numbers in the thousands, and the rate of detection is increasing rapidly.

Highly diverse planets are being identified through a diverse set of observational techniques; photometric transit detection, Doppler radial velocimetry, gravitational microlensing, and direct detection via adaptive optics imaging are all producing discoveries at an increasing rate.

In a recent talk at the Intelligent Data Understanding Conference by Greg Laughlin, Laughlin presented an overview of the census as currently understood, then showed how the different detection methods are producing complementary detections.

Like many areas in astronomy, exoplanetary detection is facing issues related to “big data”. Large online repositories (such as that produced by the Kepler Mission) serve many terabytes of data, much of which has gone analyzed due to the time-consuming algorithms required. Laughlin’s talk sought to highlight the current issues, and showed how ad-hoc collaborations across the community are being formed to deal with the challenges (and the excitement) of this fast-moving area.

Next -- A Language for Big Data Visualization

A Language for Big Data Visualization

Researchers from the University of California at Berkeley believe that increases in data availability are among the forces behind some recent innovations. However, according to the Berkeley team, visualization technology for exploring data is not keeping up.

They argue that designers may be forced to choose between scale and interactivity. The designers would prefer big displays because of the ability to show an entire data set. However, the Berkeley researchers note that viewers would prefer interactivity.

Performance constraints limit interactions to operating on a small data slice. The Berkeley team presented SUPERCONDUCTOR: a high-level visualization language for interacting with large data sets. It has three design goals: scale, interactivity, and productivity.

Through their presentation, the team showed how high-level programming abstractions support automatic parallelization. They examined three cases: selectors, layout, and rendering. In the case of layout, declarative constructs can further guide parallelization. Together, these ideas enabled their goal of high-level programming of big, interactive visualizations.

Next -- Visualizing Semantic Web Data Landscapes

 

Visualizing Semantic Web Data Landscapes

A European team consisting of researchers from Ireland, Cambridge, Maastricht University in the Netherlands, and the University of Bonn in Germany argue that the core to the success of applying Semantic Web technologies (SWT) towards supporting Life Sciences research is the availability of tools that lower the entry barrier for adoption by biomedical researchers.

Researchers need to easily and intuitively exploit and query the wealth of data that is available behind as SPARQL endpoints. Thus, the researchers present SemScape, a semantic-web enabled plugin for the popular network biology software Cytoscape.

SemScape can be used to query any knowledge bases with a SPARQL endpoint by leveraging familiarity with existing software and intuitiveness of big data exploitation through a mechanism that encapsulates the complexity of data in parametric context dependent queries. The team believes SemScape can provide a valuable resource both for data consumers and data publishers.

Next -- Archived Stream System for Big Data

Archived Stream System for Big Data

A team of researchers from the National University of Defense Technology in China found that the increasing number of applications for large data, such as Web search engines, need to have high availability fulltime tracking, storage, and analysis of a large number of real-time user access logs.

The team argues that traditional common trading application solutions are not always efficient enough to store this high rate into the archive stream. They presented an integrated approach to save this archive of data streams in a database cluster for rapid recovery.

This method is based on a simple replication protocol along with a high performance data loading and query strategy. Experimental results show that their approach efficiently load data and queries and achieve shorter recovery times than the traditional database cluster recovery methods.

Next -- Improving Big Data Availability in Massive Databases

Improving Big Data Availability in Massive Databases

The team from the National University of Defense Technology in China also put out research claiming that due to the huge scale and the number of components, big data is difficult to work in the context of relational databases, desktop statistics, and visualization packages.

A significant amount of database replication technology is used to increase the MTTF, but few have a large database system. The team argues that the traditional method of backup is not feasible, and that expensive manpower costs reduce MTTR.

On the basis of analyzing the characteristics of data in large databases, they propose a new method called Detaching Read-Only (DRO) mechanism. It reduces MTTR by reducing the physical change of the data in each database, by separating data node size granularity.

According to the research, analysis and experimental results show that their method can reduce the MTTR an order of magnitude. Further, there are no additional hardware costs, and they also reportedly reduce the high manpower costs.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

ISC'14

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event