Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


November 10, 2012

The Week in Big Data Research


Welcome to The Week in Research, covering what’s been on the academic and scientific big data radar for this beginning part of November.

Unlike last week's research news brief, the news from the MapReduce and Hadoop side is slim. Instead, the most interesting items we were able to discover deal with approaches to meshing, integrating and drilling down through big data to achieve specific organizational, functional and visual goals to help make sense of it.

Without further delay, let’s dive in with another NoSQL approach, albeit one that is targeted at a particular type of data and use case.

Punting a New NoSQL Approach

A group of researchers in Tsinghua National Laboratory for Information Science and Technology in Beijing has been tackling the problems created by the move to digitization of large volumes of library data. While this might sound like a “simple” problem when one thinks of libraries as mere text and images, the group contends that the challenges are significant—and require a novel approach.

To counter the issues, the Chinese researchers spent considerable effort creating a NoSQL database, which they call PuntDB to improve and optimize the way traditional digital resource management systems handle meta-data management, digital content storage and label management.

The team argues that as we move headlong into the era of big data, the standard architecture of digital resources need to keep pace with massive, complex, heterogeneous and continuous changing data. More specifically, many traditional materials have been moved into digital library forms, which has created problems around everything from long-term storage, perseveration and of course, data management.

The researchers describe PuntStore in detail and put it in real-world context as they describe how thir system for working through these problems has been deployed successfully to Chinese Science and Technology History library and solved the issue of managing the heterogeneous and complex metadata. The group’s test results shows that PuntStore could be an effective solution of similar application scenarios.

NEXT -- New Platform to Share Sensor Access >


New Platform to Share Sensor Access

According to researchers from the Computer Science and Engineering department at Aalto Unviersity in Finland, complex event processing has a large number of uses but it is dominated by proprietary systems and vertical products versus open technologies.

They claim that as data grows through the Internet of Things and nearly everything we carrying having a “smart” component, there will be an even large number of events that need to processed in a multi-actor, multi-platform environment.

The group says that end-user applications could benefit from the possibility for open access to all relevant sensors and data sources. They claim that now, the work being done with semantic sensor networks relies on open technologies for harvesting and integrating this data but they are looking for ways applications can more effectively access a shared set of sensors while avoiding redundant data acquisition that would lead to energy efficiency problems.

To solve these problems they propose a novel event processing platform based on the Rete algorithm, which they offers continuous execution of interconnected SPARQL queries and updates rules. The platform, called INSTANS, along with Rete, enables sharing of sensor access and caching of intermediate results in a natural and high performance manner. The group says that with incremental query evaluation, standard-based SPARQL and RDF can handle complex event processing tasks that work with the shared sensor access goals they seek to achieve.

NEXT -- When Big Data Means Lost Data >


When Big Data Means Lost Data

David Maier and V.M. Megler from Portland State’s Department of Computer Science tackle another side of the big data challenge—this time looking at the issue from a meta-management perspective.

They note that in the past, scientists’ biggest concern was that they lacked enough data to carry out their work, but now the tables have turned. It’s not just that there’s too much to store or handle, narrowing down to what’s important can be so cumbersome that they might as well not have collected it at all if it can’t be accessed.

The team used an existing scientific archive to test a possible solution to this problem via adapting information retrieval techniques that were developed for combing through scientific data in text format. Their approach uses a blend of automated and “semi-curated methods to extract metadata from large archives of scientific data. They then search across these archives’ extracted metadata and have results retuned that are ranked in similarity to the query terms.

The team puts this in the context of their work at an ocean observatory where they examined the effectiveness of the approach as well as the performance and scalability angles to see how continuous growth of those archives would affect their goals, with positive results.

NEXT -- Visualizing Networks for Social Science Research >


Visualizing Networks for Social Science Research

A team of researchers from the University of Southampton in the U.K. have presented an approach to harvesting and visualization of massive volumes of data to render it usable for social science research. To put their research in context, the group focused on data from Twitter to demonstrate their methodological approach.

The goal was to provide a new software tool that can provide visualization of social networks that emerge within a network by categorizations. The group says that for what they’ve attempted to demonstrate, Twitter is an ideal test as it is a dynamic social network that offers the immediately visible traces of apparently spontaneous social interactions and relationships. The “hashtags” of Twitter allow them to better understand the networks and social interactions that play out.

Using data from Twitter, they have enabled a timeline of communications around a specific hastag to be visualized based on retweets, thus allowing them to identify influential tweets and tweeters within this micro-network over time to see how networks form. They can then interact with this data, pausing at certain moments in time, zoom in to view more information and understand individual and group roles.

“This is not only providing a detailed understanding of the communications between actors, but exploring the pathways and flow of information pushes the methodological approach to understanding big data in terms of its dynamic nature, something that helps explain the translation and process of a network.”

NEXT -- The Emergence of Discovery Informatics >


The Emergence of Discovery Informatics

Yolanda Gil from the University of Southern California and Haym Hirsh from Rutgers University are seeking novel ways to integrate further artificial intelligence technologies into the next generation of scientific research.

The duo recently described the concept of “Discovery Informatics” which they say is an emerging area of research focused on computing advances that target scientific discovery processes requiring knowledge assimilation and reasoning, and applying principles of intelligent computing and information systems to understand, automate, improve and innovate any aspects of those processes.

The authors discuss the potential of Discovery Informatics as it relates to science, dealing with both with the big data angle and the “long tail” of science. To highlight their points, the team focuses on two areas of research for information and intelligence systems; workflows of scientific processes and citizen science, which they say are two of the best application areas for intelligent systems to provide scientific discovery processes.

While this is more of a theory-based article, it’s nonetheless interesting from a research perspective as it represents the new ways of thinking that emerging in the wake of the ever-growing influx of big, complex data.

 

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In



Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event