Follow Datanami:
April 5, 2013

This Week in Research

Isaac Lopez

Our trip around the big data world of research examines the world of real time, billion record data visualization, a look at data analysis in the crime fighting world, and examining a framework that claims to run SQL queries faster than the industry standards.

We kick off our week in research examining massive data and space flights…

Spacecraft Operational Intelligence

A critical aspect of any successful flight project is the ability to correctly interpret raw telemetry data in order to monitor the health of the spacecraft and resolve anomalous behavior, says researcher Victor Hwang at the Jet Propulsion Laboratory at the California Institute of Technology.

The main obstacle, he says, comes from organizing massive volumes of data from both the spacecraft and multiple ground data systems into time-discrete trends and patterns. Hwang’s research focuses on testing the commercial system offered by Splunk to unify arbitrary data formats into one uniform system in order to increase operational visibility.

The results, say Hwang, have been the ability to index arbitrarily formatted data while gaining architectural flexibility that enables the packaging of real time telemetry streams into intuitive web-based interfaces, allowing the user to begin with large quantities of telemetry data and systematically define filters until only the desired data is left.

NEXTBillion Record Data Visualization in Real Time >

Billion Record Data Visualization in Real Time

Researchers collaborating from Stanford and Tsinghua University recognize the challenge that data analysts face as datasets grow – sometimes with billions or more records – and conclude that traditional data visualization tools are often inadequate to handle big data. This is particularly true, they say, in cases where real time visualization and interaction is a requirement.

The researchers say that big data visualization must address two chief challenges: perceptional and interactive scalability.

Their research focuses on using data reduction methods (such as binned aggregation or sampling) to represent scalable visual summaries, and then implement techniques for interactive querying in imMens, a browser-based visual analytics system that uses WebGL for data processing and rendering on the GPU.

The results, they say, are a sustained performance of 50 frames-per-second brushing and linking of data, with imMens providing data analysts to interactively examine summaries of billions of data records in real-time. According to the researchers, this is the first system to achieve such scale of real-time brushing with data sets this large.

NEXT — Predictive Crime Fighting >

Predictive Crime Fighting

Technology has revolutionized the way police work has been done over the last 50 years, says researcher, Katina Michael of the University of Wollongong, and with the introduction of social networks and mobile computing, a paper-less paradigm is taking root.

In place of a more centralized police paradigm, a decentralized architecture is rising with distributed processing that enables the sharing of digital records across different jurisdictions. With the rise of this new paradigm, Michael suggests that new opportunities for semantically processed data can be seized – namely, predictive crime fighting.

Michaels examines ethical questions about predictive crime fighting and the long term psychological effects that such techniques may have on individuals and communities. She also looks at how such analysis may change policing paradigms, leading to an eventual state of (what she calls) uberveillance.

NEXT — Processing SQL with the Fishes >

Processing SQL with the Fishes

Researchers at the University of California at Berkley say they have uncovered a faster framework offering more fault tolerant properties and complex analytics capabilities than Hadoop or Hive.

They researchers claim that employing a novel distributed memory abstraction they are able to spin up a unified engine that can run SQL queries and sophisticated analytics functions (such as iterative machine learning) at scale. The system, they say, efficiently recovers from failures mid-query.

Dubbing the framework Shark, they claim that they are able to run SQL queries up to 100 times faster than Apache Hive. They also claim that they are able to run machine learning programs up to 100 times faster than Hadoop, all while retaining a MapReduce-like execution engine.

Datanami