Follow Datanami:
October 20, 2015

Kaggle Tackles Whale of an Identification Problem

Kaggle has hosted lots of different data science competitions over the years, but none quite like the one launched by NOAA Fisheries earlier this year. The government agency, with financial support from MATLAB creator MathWorks, is sponsoring a data science competition on Kaggle to see who can create the best algorithm for identifying individual right whales.

North Atlantic right whales are among the most endangered marine mammals on the planet, with only about 500 individuals remaining. The population is coming back since hunting ended in the 1930s, but the right whales are still threatened, with entanglement and ship strike being the primary threats they face in the ocean. Because of their precarious foothold on existence, marine biologists working for NOAA Fisheries do everything they can do help them, both on an individual and a population-wide basis.

One of the research tools that marine biologists use to help the right whales is individual identification. According to NOAA Fishery biologist Christin Khan, being able to correctly identify an individual whale is crucial for biologists to help the whales while in the field.

“When we’re out on a boat or on an airplane, sometimes we’re trying to make decisions on what to do with a whale, and that really hinges on identifying it correctly and quickly,” Khan tells Datanami. “For example sometimes we’re trying to take a biopsy sample to get genetic information from a new born calf or we may be trying to remove fishing gear from a whale.  Or we might be putting a satellite tag on it, and trying to target a particular individual to correctly tag it.”noaa_fisheries

The biologists identify right whales by looking at the white callosity pattern on the backs of their heads. While other whale species are identified by the shape of their flukes or other features, the callosity—or rough patterns on the whale’s head caused by lice—is used as a sort of fingerprint to identify right whales.

The pattern matching is done largely manually at this point. “We send these photographs back to the office and load them onto the computer and compare them by eye to catalog maintained by the New England Aquarium,” Khan says. “Depending on the skill of the researcher or the uniqueness of the whale, that can be done very quickly. But sometimes it takes hours.”

NOAA is hoping to automate much of that process by using machine learning algorithms to categorize the whales based on photographs of their callosity. And it’s looking to Kaggle‘s data science competition  website to help it identify the best algorithms to use for the process.

Thanks to MathWorks, which ponied up $10,000 in prize money for the top three winners, the Kaggle competition, titled “Right Whale Recognition,” has attracted more than 100 participants. It’s money well spent, says MathWorks Technical Marketing Manager Paul Pilotte.

“In addition to being a really important problem that we’re happy to help out with, it turns out this is a really challenging problem, and one we’re seeing lots of demand for as well,” Pilotte says. “Being able to take images and do the steps to automatically pre-process those images, to use techniques like computer vision, machine learning, and deep learning, can basically automate the process that Christin’s team has to do in a manual fashion.”

mathworks_logoThe advent of deep learning combined with GPUs has really accelerated what’s possible in the field of image classification, says Pilotte, who notes that much of the research in the field is being done by companies like Google, Facebook, and Baidu. This competition may be about identifying whales by the white splotches on their heads, but the solution to this problem may be applicable to other areas, like self-driving cars.

“We’re seeing a lot of our customers–scientists and engineers–wanting to apply those techniques a number of other really challenging types of application, where this level of automation can save a lot of time and effort,” Pilotte says. “The fact that it’s image processing married with classification… it’s an important area but one that might not always be seen in a Kaggle competition.”

NOAA supplied the Kaggle competition participants with a collection of 10,000 labeled photographs to help them train their algorithm. It might seem a simple matter to match up callosity patterns of unknown subjects against a catalog that contains the entire known population of North Atlantic right whales, but it’s anything but.

Wright WhaleAccording to Khan, the Kaggle competitors are running into difficulties just getting the algorithms to focus on the right part of the picture. Sometimes whitewater caused by a surfacing whale or the animal’s curled tail can throw the algorithms off, she says.

“We primarily are trying to get an overhead clean shot looking head down on the whale, but of course what’s ideal and reality don’t always match up,” she says. “A lot of these pictures are taking at different angles.”

If the competition pans out and some compelling algorithms rise to surface, the plan is for embedding the intelligence into an application that can easily be accessed while out in the field, perhaps running on a laptop or even on a smart phone. “Our hope is that by automating the process we can speed it up instead of a human having to troll through a catalog of all 500 individuals,” Khan says. “If we can quickly get a computer algorithm to give us ideally a match or the top two or three whales…that would really make the process of doing what we need to do much quicker.”

There’s also the possibility that NOAA Fisheries could use this technique to automate the work it does with other endangered sea mammals, including the humpback whale, Stellar sea lion, and southern resident killer whales.

“That’s another area where visual or machine learning would help,” says Jonathan Shannon, an outreach specialist with NOAA Fisheries. “If we could get them parameters or conditions where it can pick out anomalous body shapes or body sizes in migrating whales we’ve been studying [it would help us] get rid of the normal ones and focus on the anomalies where the scientist has to make a decision.”

The Right Whale Recognition competition runs through early January 2016.

Related Items:

Machine Learning Tool Seeks to Automate Data Science

How Machine Learning Is Eating the Software World

(feature art: Tim Robinson/