Baidu Ups AI Ante With Deep Learning Release
Chinese search giant Baidu Inc. is publicly releasing Chinese language APIs for its primary speech technologies as it and rivals continue to open up their portfolios of AI-based deep learning technologies.
The Beijing-based company (NASDAQ: BIDU) said the APIs cover two variations of speech recognition, a speech synthesis tool and a fourth called “wake word” used to activate voice-controlled digital assistants such as the Amazon Echo.
The speech APIs represent the latest in a series of public releases by the Chinese company that includes natural language processing along with facial and optical character recognition technologies. It also underscores how AI technology pioneers are attempting to attract application developers as competition heats up in the personal digital assistant and other emerging markets.
In September, Baudu released a deep learning framework called “PaddlePaddle” that it pitches as a simple-to-use platform for developing new products and services. “We are at the dawn of the AI era. By opening our AI technologies, we will make it easier for everyone to create AI-enabled applications,” Andrew Ng, Baidu’s chief scientist, noted in a statement.
The company said its “Long Utterance” speech recognition technology could be used to automatically transcribe audio clips such as interviews, speeches and lectures. Its “Far-Field” speech recognition API targets applications such as voice-controlled consumer devices up 16 feet away.
The company’s speech synthesis tool is based on deep learning technology that provides a “collection of realistic voices” for applications ranging from reading audio books or news articles.
The “wake word” API release would allow developers to create customized short words or phrases to activate devices.
Baidu’s Silicon Valley AI Lab has been focusing on GPU-based deep learning development as the growing availability of massive data sets combined with deep learning tools are pushing the boundaries of overarching machine learning platforms.
“Deep learning has already tremendously advanced the state of the art of real world problems in computer vision and speech recognition,” Greg Diamos, a former Nvidia Corp. (NASDAQ: NVDA) architect and senior researcher at Baidu, told our sister website HPCwire.
“Many problems in these domains and others that were previously considered too difficult are now within reach,” Diamos argued.
Indeed, Nvidia co-founder and CEO Jen-Hsun Huang emphasized how deep learning tools running on GPUs are propelling big data analytics into the AI arena. “AI computer scientists discovered new algorithms that made it possible to achieve levels of results and perception using this technique called deep learning that nobody had ever imagined,” the Nvidia chief told a technology conference in April.
Meanwhile, a steady stream of open source releases by Baidu parallels gambits by competitors such as Google (NASDAQ: GOOGL), which last year released its deep learning library. The U.S. search giant said the library called TensorFlow is used across Google for a range of deep learning applications.