IBM Watson Learns Deeply, So You Don’t Have To
Advances in the field of cognitive computing are stacking up as researchers push the bounds of deep learning and related techniques like neural networks. But organizations won’t have to wait years for these advances to bubble up from academia thanks the work that groups like IBM Watson are doing to put the technology into people’s hands as quickly as possible.
After Watson won on Jeopardy! in 2011, IBM doubled down on the technology in 2014 and formed the Watson Group. Since then it has shipped a variety of products and services built on the core Watson technology, including Watson Analytics, Watson Discovery Advisor, Watson Engagement Advisor, and Watson Oncology. These products incorporate various aspects of IBM’s R&D into artificial intelligence and cognitive computing fields, such as natural language processing (NLP), speech recognition, and computer vision, to solve specific problems in particular industries.
In addition to these standalone products, IBM is also rolling out a series of Watson analytic services that practically anybody can use within their own applications. These free Watson services, which are cut from the same cognitive cloth that make up the bigger Watson products, are hosted on IBM’s BlueMix cloud and can be called through a simple REST API.
This week IBM released the general availability (GA) of three more cloud-based Watson analytic services, including IBM Watson Language Translation, IBM Speech to Text, and IBM Text to Speech– which are part of the IBM Watson Developer Cloud. These add to the five that IBM had already released, including its Personality Insights and Tradeoff Analytics services, and three services from recently acquired AlchemyAPI, including AlchemyLanguage, AlchemyVision, and AlchemyData News. All told, there are more than 20 Watson services at some stage in the delivery pipeline, either GA, beta, or experimental.
The three new Watson services will be useful in building systems that communicate more effectively and accurately with people. For example, a customer could use the speech recognition API to convert human speech to text, and then use the Question and Answer API (still in beta) and dialog services to engage in a conversation with the human. Then the computer could speak back to the user via the text-to-speech API that just became available.
“That’s an example of four services being combined together,” says Jerome Pesenti, vice president of core technology at IBM Watson. “We have lots of things around Q&A, dialog, multi-modality, and emotion as well. Some of them are already acceptable as beta and experimental in the platform.”
Pesenti heads the R&D team in charge of turning the advances in cognitive computing into real-world tools. “The state of technology has improved tremendously in the past 10 years,” he tells Datanami. “We’re finding lots and lots of use cases. We’re working with hundreds of customers already, and putting things in production. Things are happening right now.”
In addition to the three new Watson APIs, IBM announced that Pesenti’s team will be working with renowned deep learning expert Yoshua Bengio and his team at the Montreal Institute for Learning Algorithms (MILA) to push the state of the art even further.
“Yoshua Bengio’s team is one of the leading institutions in that field and we’re spending an enormous amount of resources in that area, and we feel we can do it better by doing it together,” Pesenti says. “We see a lot of use cases, and we have lots of ideas, and they have a lot of knowledge in the space.”
The new services leverage the core R&D IBM in deep learning, but don’t require customers to be experts in building and training traditional machine learning models—let alone the neural networks on which they’re based–to get any benefits. “You can use these algorithms out of the box,” Pesenti says. “We have huge training we do internally. Then usually we have a small adaptation layer [used by the customer]. They’re just adding data. They don’t create models. They just give us more data and we know how to use the data to enhance the model right away.”
Pesenti sees the recent advances made in deep learning having a wide impact on a variety of products and services. “We think that cognitive technology will be ubiquitous,” he says. “My view is any app will have some kind of [cognitive] component, such as natural language interaction, where instead of typing things, you can just say them and answer from a menu.”
Speech recognition has proven to be a tough nut to crack for decades, but researchers are making big progress in ramping up the accuracy thanks to deep learning and neural nets. Pesenti’s team recently wrote a paper describing how they improve the error rate from 12 percent to 8 percent, a significant jump at this stage of the game.
The gains are due to a combination of better algorithms and the capability to train models against huge data sets on big compute clusters. “With deep learning, you can throw raw data at it, raw text and images and audio, and the system will learn the features on its own, which makes the system more flexible to new domains,” he says. “It’s still good to have better data and do curation. But definitely there’s less feature engineering–much less handcrafting of the features.”
The ability to skip the cumbersome feature extraction step helps, but so does having bigger hardware and bigger data sets to train against.
“We have a large cluster of servers using GPUs at IBM,” Pesenti says. “Many of the deep learning training [sessions] use GPUs because it’s faster. That’s a big part. And depending on the workload–speech or vision—you need to use very large training set, lots of data, lots of computing power…The two combine to really improve performance.”
All that’s required to get started with the Watson Services are a valid credit card. You can sign up at the IBM Watson Developer Cloud webpage.