October 20, 2016

Microsoft Claims Speech Recognition ‘Parity’

George Leopold

(Viktorus/Shutterstock)

Microsoft speech recognition researchers report they have achieved parity with humans using a testing framework that gauges error rates for professional transcribers and the ability to understand “open-ended conversations.”

“In both cases, our automated system establishes a new state-of-the-art, and edges past the human benchmark,” members of the Microsoft Artificial Intelligence and Research group claimed in a paper published this week. “This marks the first time that human parity has been reported for conversational speech.”

Using a U.S. human error rate benchmark to test it speech recognition system, the researchers said their approach equaled the 5.9 percent error rate for transcribers and the 11.3 percent error rate for open-ended conversations among friends or family members.

The Microsoft (NASDAQ: MSFT) team attributed the advance to systematic use of convolutional and recurrent neural networks, including LSTM, for Long Short-Term Memory networks. These speech recognition networks were combined with a “spatial smoothing” method along with an acoustic training technique.

Since a single measure of human performance was insufficient to accurately gauge an automated system, the conversational speech recognition system was compared with both professional transcribers using a “Switchboard” benchmark along with a separate “CallHome” test. The new Microsoft system showed an improvement of about 0.4 percent, the researchers reported, exceeding human performance “by a small margin.”

Convolutional models were found to perform best, but the researchers also noted that LSTM networks also showed promise for both acoustic and language modeling. Inspired by the human auditory cortex, the part of the brain responsible for the ability to hear, the researchers said they employed a spatial smoothing technique to improve the accuracy of its LSTM models

The researchers used three variants of convolutional neural networks in their acoustic model along with a combination of complementary models.

The neural networks incorporated into the speech recognition system were trained on Microsoft’s “cognitive toolkit” running on Linux-based servers with multiple GPUs. The toolkit leveraged graphics processing to accelerate the training of acoustic models that previously required at weeks or months.

Microsoft released its Computational Network Toolkit on Github earlier this year, saying it undertook the project out of necessity: Current tools used to improve how computers understand human speech were slowing progress.

Meanwhile, the Microsoft researchers’ analysis of human versus machine errors indicated “substantial equivalence,” with the exception of recognizing familiar aspects of human speech known as “backchannel acknowledgements” such as “uh-huh” and hesitations like “um.”

“The distinction is that backchannel words like ‘uh-huh’ are an acknowledgment of the speaker, also signaling that the speaker should keep talking, while hesitations like ‘uh’ are used to indicate that the current speaker has more to say and wants to keep his or her turn.” These “turn-management devices ” therefore “have exactly opposite functions” when a speech recognition system attempts to classify individual words, they noted.

Certain words continue to trip up both human transcribers and speech recognition systems. For example, the researchers found that so-called “short function words” generate the most errors.

Illustrating the subtleties of human speech, they found that the word “I” was omitted most often by transcribers. “While we believe further improvement in function and content words is possible, the significance of the remaining backchannel/hesitation confusions is unclear,” they added.

Recent items:

Deep Neural Networks Power Big Gains in Speech Recognition

Microsoft Releases Deep Learning Toolkit

Applications: Artificial Intelligence, Data Mining, Research Analytics

Technologies: Frameworks, Processors, Systems

Sectors: Other, Retail

Vendors: Microsoft

Tags: AI, deep learning, gpus, language modeling, LSTM, neural networks, speech recognition

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Microsoft Claims Speech Recognition ‘Parity’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Microsoft Claims Speech Recognition ‘Parity’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link