There may be a shortage of data scientists in this world, but there is no shortage of people who enjoy computer games. If designed properly, a computer game may even assist in crowd-sourcing solutions to problems researchers may not have time for.
This is the premise upon which “Crowd-sourced BioGames: Managing The Big Data Problem for NextGeneration Lab-on-a-Chip Platforms,” a paper published by UCLA’s Ozcan Research Group, is built.
The group, headed by Dr. Aydogan Ozcan, created a game in which the user is taken through simple instructions, in which the user was provided sample images of malaria-infected cells, before letting them loose on ‘decks’ of cell images. The user’s goal was to determine if the cell was infected with malaria (they chose malaria because of its prevalence in the world’s impoverished).
Overall, the game itself is very simple. It takes about two or three minutes, if one is paying attention and reading carefully, to read and understand the instructions. The paper requires the player to identify a minimum of 100 cells to be considered a “committed gamer” and have their data collected. Since the game shows the player a card of 24 cell images at a time and evaluating a cell is a simple matter of clicking on the infected ones, a user can reasonably achieve this status in about thirty seconds.
Further, each deck consists of 21 cards, the 21st of which holds only 20 images, or 500 images. While the player’s success rate after each card is tracked by a color-coded (red for terrible, yellow for mediocre, green for excellent) performance bar, the incentive is to hit the end of the deck so the user may see his or her specific identification rates. With five minutes’ worth of care, it is not difficult to achieve a 90% success rate.
This is exactly what the Ozcan Research Group was going for. Though the game is not entertaining enough to addictively command an average human’s attention span like Angry Birds or Temple Run, it is scientifically interesting enough to command one’s attention for five to ten minutes—or 500-1000 cells. According to the study, the game attracted 2,150 gamers (989 of which were ‘committed’) from 77 countries combining to create 1.5 million cell diagnoses over a period of only three months.
The results were impressive and promising. “Combining the responses of these 989 gamers using MAP estimation, we were able to achieve an accuracy of 98.13% when compared to the ground truth data (generated by the consensus of 9 medical experts).” While the accuracy drops when identifying the cells that are actually infected, it is still high enough to be statistically significant. “In our BioGames experiments, the PPV was 76.85%, meaning that more than three quarters of the cells that were labelled as infected, were indeed infected. We also achieved an NPV of 98.78%, such that almost all of the cells labelled as negative are correctly labeled as such.”
This all helps alleviate the big data problem for several reasons. For one, humans are significantly better at image discrimination than computers. It may take several months and a significant amount of money to program a computer to diagnose malarial cells with the accuracy that a human achieves in five minutes. Also, the only data left to collect after this enterprise is statistical data on which cells have high instances of reported infection.
Crowd-sourcing is not a new phenomenon in problem-solving. When websites verify a user’s humanity by having him or her identify a blurred word, a second word is usually given. That second word’s identity is not known to the website, but is rather a word from an ancient manuscript that its researchers hope to decipher. The analogous control here is that every 25th cell has a known identity, such that the players may be properly evaluated and the stronger players’ decisions carry more weight.
Of course, it helps in crowd-sourcing to have a binary response problem. Here, either a cell is infected with malaria or it is not. People would almost assuredly struggle if there were more options. For example, if they were asked to choose among malaria, sickle-cell, and healthy, the accuracy would go down, even if sickle-cell is relatively easy to spot. Progress could still probably be made but it would likely not be as effective.
This brings up the next point that makes malaria diagnosis a good candidate for crowd-sourcing: ease of identification. It takes about three minutes on average to train humans to diagnose malaria in a cell with 78-98% accuracy. Obviously, other diseases are not so simple to spot.
With all that said, however, crowd-sourcing in this case delegates a tedious but daunting task to a large group that, when combined, do just as well as medical professionals. This hypothetically, if carried out to its logical end, frees those professionals to identify more advanced diseases as well as lessening the strain on incoming big data healthcare analytics platforms.