Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Leverage Big Data'14

February 23, 2013

The Week in Research

This week’s selection of research items of interest in the data-intensive computing ecosystem includes new ways of visualizing textual data, lifelong machine learning, an interesting approach to the creation of social graphs and a look at the future for RDF and SPARQL in big data environments.

In case you missed it, here is last week’s edition of research briefs. Let’s dive in with our first item:

Visualizing Streaming Text Data

A team from AT&T Labs has focused on the endless text-based streams that are making it more challenging to analyze and discover relevant information. To address these challenges, they put forth an approach for visualizing text streams in real-time presented as a dynamic graph with an associated map.

The group says that this approach automatically groups similar messages into “countries” with keyword summaries, using semantic analysis, graph clustering and map generation techniques. They say it handles the need for visual stability across time by dynamic graph layout and Procrustes projection techniques, enhanced with a “novel stable component packing algorithm.”

The result, they say, offers an ongoing, accurate view of evolving topics of interest. They put this in context using an online service called TwitterScope.

Next -- Lifelong Machine Learning >


Lifelong Machine Learning

According to Qiang Yang from Huawei Technologies, the flood of new data types requires a more robust data-mining system that can keep pace with changing data in a continual manner.

Qiang discusses how this creates a need for Lifelong Machine Learning, which in contrast to the traditional one-shot learning, should be able to identify the learning tasks at hand and adapt to the learning problems in a sustainable manner.

More specifically, a foundation for lifelong machine learning is transfer learning, whereby knowledge gained in a related but different domain may be transferred to benefit learning for a current task. To make effective transfer learning, he argues that it is important to maintain a continual and sustainable channel in the life time of a user in which the data are annotated.

Qiang outlines lifelong machine learning situations, gives several examples of transfer learning and applications for lifelong machine learning, and discusses cases of successful extraction of data annotations to meet the big data challenge.

Next -- A Scalable Social Graph Generator  >


A Scalable Social Graph Generator  

According to a research team from European organizations CWI and OpenLink software, benchmarking graph-oriented database workloads and graph-oriented database systems is increasingly becoming relevant in analytical big data tasks, such as social network analysis.

They argue that with graph data, structure is not mainly found inside the nodes, but especially in the way nodes happen to be connected, i.e. structural correlations. Because such structural correlations determine join fan-outs experienced by graph analysis algorithms and graph query executors, they are an essential, yet typically neglected, ingredient of synthetic graph generators.

To address this, the presents S3G2: a Scalable Structure-correlated Social Graph Generator. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, which is intended as test data for scalable graph analysis algorithms and graph database systems. They generalize the problem by decomposing correlated graph generation in multiple passes that each focus on one so-called correlation dimension; each of which can be mapped to a MapReduce task.

The team demonstrates that S3G2 can generate social graphs that (i) share well-known graph connectivity characteristics typically found in real social graphs (ii) contain certain plausible structural correlations that influence the performance of graph analysis algorithms and queries, and (iii) can be quickly generated at huge sizes on common cluster hardware.

Next - What RDF and SPARQL Bring to Big Data >


What RDF and SPARQL Bring to Big Data

According to Bob DuCharme a solution architect from Virginia-based TopQuadrant,there is still a solid future ahead for RDF.

He argues that The Resource Description Format (RDF), a W3C standard since 1999, which describes a data model that can represent most known structured and semi-structured data formats, has innate simplicity and flexibility.

He notes that the accompanying standards, such as the SPARQL query language and an optional schema language, also provide a great infrastructure for addressing many of the issues that make big data different from traditional relational database management.

DuCharme says that because of these features, both open-source efforts and offerings from commercial vendors such as IBM, Oracle, and Cray have found that RDF technology offers an excellent platform for taking an agile approach with large, dynamic aggregations of data that won't fit neatly into predefined tables.

Because RDF technology is all built from public standards, offerings from more specialized vendors such as triplestores from Allegrograph and Stardog and the TopBraid application platform from TopQuadrant can mix and match with Cray, IBM, and Oracle's offerings as well as with open-source tools to create applications that can start small and provide a basis for incremental growth up to trillions of triples.

Share Options


» Subscribe to our weekly e-newsletter


There are 0 discussion items posted.


Most Read Features

Most Read News

Most Read This Just In

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia


Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014

» View/Search Events

» Post an Event