Follow Datanami:
October 11, 2013

Kaggle Contest Aims to Separate Cats from Dogs

Isaac Lopez

“Deep Blue beat Kasparov at chess in 1997. Watson beat the brightest trivia minds at Jeopardy in 2011. Can you tell Fido from Mittens in 2013?” This is the message that data scientists are greeted with as part of the latest machine learning competition run by Kaggle. The challenge: building an algorithm that can distinguish cats from dogs.

In this latest machine learning standoff, the data science dungeon masters at Kaggle have gotten their hands on a data set from Microsoft Research that contains over three million photos of dogs and cats from the world’s largest site for locating homes for pets,

The images themselves have been gathered by a group of coders for the Asirra project, a human interactive proof that functions as a CAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart). As with other types of CAPTCHA programs, Asirra is used to give websites the ability to filter bots from actual users browsing among their pages.

This latest Kaggle competition is apparently aimed at defeating this particular CAPTCHA by giving an algorithm the brains enough to recognize the difference between the two furry friends. The contest, which is considered to be merely “playground” fun, is not being sponsored by a major technology interest, but is instead being done as a competitive diversion for those involved. First place for the competition is not a job interview opportunity at Facebook, but a $76 donation to the ASPCA (or animal charity of the winner’s choosing).

The challenge itself is actually quite considerable. “While random guessing is the easiest form of attack, various forms of image recognition can allow an attacker to make guesses that are better than random,” explains the Kaggle competition admins. “There is enormous diversity in the photo database (a wide variety of backgrounds, angles, poses, lighting, etc.), making accurate automatic classification difficult.“

In a 2008 paper published by Philippe Golle at the Palo Alto Research Center, Golle explained that the state of the art for this specific type of recognition is a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra – which is enough to render the proof engine obsolete. Despite this, the competition masters at Kaggle are challenging their community to take this success to the next level.

“We have created this contest to benchmark the latest computer vision and deep learning approaches to this problem. Can you crack the CAPTCHA? Can you improve the state of the art? Can you create lasting peace between cats and dogs? Okay, we’ll settle for the former.”

Already, the competition has gotten off to a great start since it launched on September 25th. With 30 individuals registered for the competition, and 78 entries at this article’s publishing time, the top entry seems to have shattered the 82.7% mark with an accuracy score of 96.7% within two competition entries. The next highest score is 85.7%, raising some questions about the methods used for the leading scorer.

While the Kaggle community is generally good-spirited in their competition, the concern of cheating is always present. “This particular competition is not appealing to me because it will be a ‘try to catch a cheater’ competition,” wrote one community member in the Kaggle forums.

The competition is scheduled to run over the next three months, ending on Saturday, February 1, 2014.

Related items:

Data Science Has Got Talent as Facebook Launches Competition 

Data Athletes and Performance Enhancing Algorithms 

Raising a Pack of Data Scientists