Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
Rogue Wave

August 29, 2012

Let the BioGames Begin


There may be a shortage of data scientists in this world, but there is no shortage of people who enjoy computer games. If designed properly, a computer game may even assist in crowd-sourcing solutions to problems researchers may not have time for.

This is the premise upon which “Crowd-sourced BioGames: Managing The Big Data Problem for NextGeneration Lab-on-a-Chip Platforms,” a paper published by UCLA’s Ozcan Research Group, is built.

The group, headed by Dr. Aydogan Ozcan, created a game in which the user is taken through simple instructions, in which the user was provided sample images of malaria-infected cells, before letting them loose on ‘decks’ of cell images. The user’s goal was to determine if the cell was infected with malaria (they chose malaria because of its prevalence in the world’s impoverished).

Overall, the game itself is very simple. It takes about two or three minutes, if one is paying attention and reading carefully, to read and understand the instructions. The paper requires the player to identify a minimum of 100 cells to be considered a “committed gamer” and have their data collected. Since the game shows the player a card of 24 cell images at a time and evaluating a cell is a simple matter of clicking on the infected ones, a user can reasonably achieve this status in about thirty seconds.

Further, each deck consists of 21 cards, the 21st of which holds only 20 images, or 500 images. While the player’s success rate after each card is tracked by a color-coded (red for terrible, yellow for mediocre, green for excellent) performance bar, the incentive is to hit the end of the deck so the user may see his or her specific identification rates. With five minutes’ worth of care, it is not difficult to achieve a 90% success rate.

This is exactly what the Ozcan Research Group was going for. Though the game is not entertaining enough to addictively command an average human’s attention span like Angry Birds or Temple Run, it is scientifically interesting enough to command one’s attention for five to ten minutes—or 500-1000 cells. According to the study, the game attracted 2,150 gamers (989 of which were ‘committed’) from 77 countries combining to create 1.5 million cell diagnoses over a period of only three months.

The results were impressive and promising. “Combining the responses of these 989 gamers using MAP estimation, we were able to achieve an accuracy of 98.13% when compared to the ground truth data (generated by the consensus of 9 medical experts).” While the accuracy drops when identifying the cells that are actually infected, it is still high enough to be statistically significant. “In our BioGames experiments, the PPV was 76.85%, meaning that more than three quarters of the cells that were labelled as infected, were indeed infected. We also achieved an NPV of 98.78%, such that almost all of the cells labelled as negative are correctly labeled as such.”

This all helps alleviate the big data problem for several reasons. For one, humans are significantly better at image discrimination than computers. It may take several months and a significant amount of money to program a computer to diagnose malarial cells with the accuracy that a human achieves in five minutes. Also, the only data left to collect after this enterprise is statistical data on which cells have high instances of reported infection.

Crowd-sourcing is not a new phenomenon in problem-solving. When websites verify a user’s humanity by having him or her identify a blurred word, a second word is usually given. That second word’s identity is not known to the website, but is rather a word from an ancient manuscript that its researchers hope to decipher. The analogous control here is that every 25th cell has a known identity, such that the players may be properly evaluated and the stronger players’ decisions carry more weight.

Of course, it helps in crowd-sourcing to have a binary response problem. Here, either a cell is infected with malaria or it is not. People would almost assuredly struggle if there were more options. For example, if they were asked to choose among malaria, sickle-cell, and healthy, the accuracy would go down, even if sickle-cell is relatively easy to spot. Progress could still probably be made but it would likely not be as effective.

This brings up the next point that makes malaria diagnosis a good candidate for crowd-sourcing: ease of identification. It takes about three minutes on average to train humans to diagnose malaria in a cell with 78-98% accuracy. Obviously, other diseases are not so simple to spot.

With all that said, however, crowd-sourcing in this case delegates a tedious but daunting task to a large group that, when combined, do just as well as medical professionals. This hypothetically, if carried out to its logical end, frees those professionals to identify more advanced diseases as well as lessening the strain on incoming big data healthcare analytics platforms.

Related Stories

DNA to Carry New Data Burden

A Different Einstein on Another Old Problem

The Path to Personalized Medicine

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia

SGI DataRaptor with MarkLogic Database

Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event