Aspen
Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report HPCwire Japan


January 19, 2013

The Week in Big Data Research


This week's big data research and development stories cover a wide area, as researchers from Europe, China, and the United States figure out how to use big data to solve biomedical problems on this planet along with finding and storing data on other planets. Visualization and archiving big data in large databases also got its share of attention this week.

Without further ado, here begins 2013's first Week in Big Data Research

Big Planetary Census Data

Fifteen years ago, the salient features of the known extrasolar planets could be written down on an index card. At present the catalog of extrasolar planets numbers in the thousands, and the rate of detection is increasing rapidly.

Highly diverse planets are being identified through a diverse set of observational techniques; photometric transit detection, Doppler radial velocimetry, gravitational microlensing, and direct detection via adaptive optics imaging are all producing discoveries at an increasing rate.

In a recent talk at the Intelligent Data Understanding Conference by Greg Laughlin, Laughlin presented an overview of the census as currently understood, then showed how the different detection methods are producing complementary detections.

Like many areas in astronomy, exoplanetary detection is facing issues related to “big data”. Large online repositories (such as that produced by the Kepler Mission) serve many terabytes of data, much of which has gone analyzed due to the time-consuming algorithms required. Laughlin’s talk sought to highlight the current issues, and showed how ad-hoc collaborations across the community are being formed to deal with the challenges (and the excitement) of this fast-moving area.

Next -- A Language for Big Data Visualization

A Language for Big Data Visualization

Researchers from the University of California at Berkeley believe that increases in data availability are among the forces behind some recent innovations. However, according to the Berkeley team, visualization technology for exploring data is not keeping up.

They argue that designers may be forced to choose between scale and interactivity. The designers would prefer big displays because of the ability to show an entire data set. However, the Berkeley researchers note that viewers would prefer interactivity.

Performance constraints limit interactions to operating on a small data slice. The Berkeley team presented SUPERCONDUCTOR: a high-level visualization language for interacting with large data sets. It has three design goals: scale, interactivity, and productivity.

Through their presentation, the team showed how high-level programming abstractions support automatic parallelization. They examined three cases: selectors, layout, and rendering. In the case of layout, declarative constructs can further guide parallelization. Together, these ideas enabled their goal of high-level programming of big, interactive visualizations.

Next -- Visualizing Semantic Web Data Landscapes

 

Visualizing Semantic Web Data Landscapes

A European team consisting of researchers from Ireland, Cambridge, Maastricht University in the Netherlands, and the University of Bonn in Germany argue that the core to the success of applying Semantic Web technologies (SWT) towards supporting Life Sciences research is the availability of tools that lower the entry barrier for adoption by biomedical researchers.

Researchers need to easily and intuitively exploit and query the wealth of data that is available behind as SPARQL endpoints. Thus, the researchers present SemScape, a semantic-web enabled plugin for the popular network biology software Cytoscape.

SemScape can be used to query any knowledge bases with a SPARQL endpoint by leveraging familiarity with existing software and intuitiveness of big data exploitation through a mechanism that encapsulates the complexity of data in parametric context dependent queries. The team believes SemScape can provide a valuable resource both for data consumers and data publishers.

Next -- Archived Stream System for Big Data

Archived Stream System for Big Data

A team of researchers from the National University of Defense Technology in China found that the increasing number of applications for large data, such as Web search engines, need to have high availability fulltime tracking, storage, and analysis of a large number of real-time user access logs.

The team argues that traditional common trading application solutions are not always efficient enough to store this high rate into the archive stream. They presented an integrated approach to save this archive of data streams in a database cluster for rapid recovery.

This method is based on a simple replication protocol along with a high performance data loading and query strategy. Experimental results show that their approach efficiently load data and queries and achieve shorter recovery times than the traditional database cluster recovery methods.

Next -- Improving Big Data Availability in Massive Databases

Improving Big Data Availability in Massive Databases

The team from the National University of Defense Technology in China also put out research claiming that due to the huge scale and the number of components, big data is difficult to work in the context of relational databases, desktop statistics, and visualization packages.

A significant amount of database replication technology is used to increase the MTTF, but few have a large database system. The team argues that the traditional method of backup is not feasible, and that expensive manpower costs reduce MTTR.

On the basis of analyzing the characteristics of data in large databases, they propose a new method called Detaching Read-Only (DRO) mechanism. It reduces MTTR by reducing the physical change of the data in each database, by separating data node size granularity.

According to the research, analysis and experimental results show that their method can reduce the MTTR an order of magnitude. Further, there are no additional hardware costs, and they also reportedly reduce the high manpower costs.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

SGI DataRaptor with MarkLogic Database

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

HPCwire Live! Atlanta's Big Data Kick Off Week Meets HPC: What does the future holds for HPC?

Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia

Intersect360

Job Bank

Datanami Conferences Ad

Featured Events

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

June 27-28, 2013
Hot Storage '13
San Jose, CA
United States

July 17-18, 2013
Big Data Security Conference
Boston, MA
United States

September 9-9, 2013
10th Annual HPC for Wall Street
New York City, NY
United States

» View/Search Events

» Post an Event