Follow Datanami:
April 19, 2013

This Week in Research

In this week’s research roundup, we examine the use of big data to match names to faces to help fight fraud and other crimes, authenticating the credibility of citizen reporters, the question of what to do with data that doesn’t fit the MapReduce framework, and big data for bird watching. Get out your binoculars and bird calls – it’s This Week in Research.

 

Using Names as Facial Identifiers

“When you hear a first name like ‘Bob,’ do you imagine a face with a certain appearance,” ask researchers at the Advanced Multimedia Processing Lab at Cornell University. The researchers have studied the statistical relationships between names and faces, and say they have reached some surprising results – their facial recognition algorithm was able to predict the correct first names of test faces at rates far greater than chance.

Using tagged images from social media sites such as Flickr, the researchers built models for 100 common first names used in the United States. Extracting the features into grids, the researchers were able to construct facial averages for these selected names. Using these data sets, the algorithm is able to analyze a photo and offer predictions on what the individuals name is. The results, say the researchers, are that the system is able to guess the correct first name at a rate greater than 4x vs a random generator, and greater than 2x if the gender is assumed to be known.

While the researchers admit that it is unrealistic to expect perfection in such a system, even if imperfect the technology could have a broad range of applications in security and biometrics.

A demo of the utility can be seen here.

 

NEXT – Authenticating Citizen Reporting –>

Authenticating Citizen Reporting

The rise of social media has led to the rise of the citizen reporter, as evidenced during the manhunt for the Boston Marathon bombing suspects when tweeter, Andrew Kitzenberg captured the attention of the twitterverse with his updates on the scene of the shootout happening in his driveway.

Using (what they refer to as) “big data” techniques, researchers from Wellesley College examine the implications of this rising trend, and how data might be used to assess the credibility of these reports, especially in situations where public safety may depend on the information that they are disseminating.

The researchers collected and analyzed data produced by citizens in Monterrey, Mexico using the “#MTYfollow” tag. While information about local things such as weather and traffic are transmitted on this tag frequency, serious events such as fire fights between drug cartels are also reported on.

In looking for ways that to evaluate the credibility of information received through these real-time social media channels, the researchers examine various facets of the phenomenon towards the establishment of semi-automatic algorithms that can be used to measure the trustworthiness of such information and sources.

 

NEXT – MapReduce Good Enough? –>

MapReduce Good Enough?

“If all you have is a hammer, throw away everything that is not a nail,” exclaims Jimmy Lin at the iSchool, University of Maryland.

“Hadoop is currently the large-scale data analysis ‘hammer’ of choice, but there exist classes of algorithms that aren’t “nails” in the sense that they are not particularly amenable to the MapReduce programming model,” writes Lin who perhaps controversially suggests that rather than creating alternative programming models or MapReduce extensions to deal with this data, is should simply be thrown out.

Lin explores three large classes of problems that he says serve as the poster children for “MapReduce bashing,” including iterative graph algorithms (e.g., PageRank), gradient descent (e.g., for training logistic regression classifiers), and expectation maximization (e.g., for training hidden Markov models, k-means).

In his examinations, ultimately Lin concludes that a two-pronged approach towards the development of big data systems and frameworks should be pursued. “On one hand, we should perfect the hammer we already have by improving its weight balance, making a better grip, etc. On the other hand, we should be developing jackhammers – entirely new ‘game changers’ that can do things MapReduce and Hadoop fundamentally cannot do. In my opinion, it makes less sense to work on solving classes of problems for which Hadoop is already ‘good enough.’”

You can read the full article here.

 

NEXT – Big Data Bird Watching –>

Big Data Bird Watching

Advances in data analytics are affecting all of the conventional areas of science and industry, but it’s also reaching into domains that are somewhat esoteric – including bird watching. The community of ornithologists and bird conservationists have millions of bird watchers at their disposal and researchers, with programs already underway at the Cornell Laboratory of Ornithology.

Taking this lead, researcher, Frank Kendlin at the Dublin Institute of Technology, examines how data visualization can be used to influence and motivate a “citizen science” movement towards generating ‘big data’ in Ireland and the UK, along the lines of what is already happening in North America.

Datanami