Baidu’s AI Algorithm Parses Video
China’s heavy investment in artificial intelligence is beginning to bear fruit with a report of researchers at Baidu, the Internet search giant, winning a competition designed to test the ability of AI algorithms to recognize and classify actions in video clips.
The ActivityNet Challenge was designed to gauge the ability of AI algorithms to move beyond categorizing still images to recognize actions contains in 10-second video clips. Baidu Research (NASDAQ: BIDU) said its AI approach identified the contents of a database of 300,000 YouTube videos with an average accuracy rate of 87.6 percent. That rate was 1.5 percent higher than the second place finisher, the company added.
Among the applications of Baidu’s video recognition and classification approach is personalizing newsfeeds for its search engine users that increasingly include video content. Other applications include software that could more accurately screen hours of footage from security cameras.
Baidu Research’s AI algorithm is based on a spatial-temporal modeling framework in which video data was used to train a neural network to extract video features such as color and audio. Those features were then fed into four temporal models for video classification.
“Understanding what’s in a video is more challenging than recognizing still images,” Baidu researchers Xiao Liu and Shilei Wen noted in a blog post describing the company’s approach. “The content of a video is determined by all the frames; hence it is necessary to aggregate the multi-frame information to yield robust and discriminative video representation.”
The upshot is that frameworks tested Baidu Research and other competitors demonstrated that AI algorithms could now be used not only to identify still images but also to understand actions on videos rather than simply interpreting what the image represents. The company likens the approach to understanding the difference between nouns and verbs.
Growing competition to develop video classification algorithms also reflects the booming market for video analytics as high-definition video cameras become ubiquitous and the volume of video data explodes. Industry experts note that video is the fastest growing data type on the Internet.
Hence, Baidu, Facebook (NASDAQ: FB), Google, IBM (NYSE: IBM) and others are investing heavily in new AI approaches designed to sort through everything from uploaded cat videos and surveillance camera footage to geospatial data captures by drones equipped with cameras. Ultimately, the rivals are seeking new ways for AI models to understand the physical world.
Competitors in the ActivityNet competition used a data set compiled DeepMind Technologies, the U.K.-based AI specialist acquired by Google (NASDAQ: GOOGL) in 2014. The Kinetics data set was drawn from videos uploaded to YouTube covering about 400 human actions ranging from mountain climbing to flower arranging.
More details on the Baidu Research video classification framework are available here.