End-to-end deep learning models for speech applications

The Data Exchange Podcast: Yishay Carmiel on recent progress in speech technologies.

Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.

In this episode of the Data Exchange I speak with Yishay Carmiel, an AI Leader at Avaya, a company focused on digital communications. He has long been immersed in speech technologies and conversational applications and I have frequently used him as a resource to understand the latest in speech systems. We previously co-wrote an article that listed out recommendations for teams building speech applications. We also had a previous conversation on the impact of deep learning and big data on speech technologies.

Are you using AI Responsibly? Join us December 15, 2020 for a series of short talks on Responsible AI—it’s free, and you can join the livestream or access the sessions on-demand.

We focused on recent developments in speech technologies. We recorded this podcast right after the NLP Summit where one of the keynotes was presented by Bo Li, a noted researcher from Google. In his keynote, Bo gave an overview of end-to-end speech models for ASR (automatic speech recognition). An end-to-end model incorporates functions from traditionally disparate components and puts them into a single neural network and optimizes them jointly.

[Image: Bo Li keynoting at the 2020 NLP Summit.]

Given that he works with both researchers and application builders, I devoted this episode to understanding Yishay’s perspective on the rise of end-to-end ASR models, text-to-speech systems, and responsible AI in the context of speech technologies. According to Yishay, for now real-time, end-to-end deep learning models aren’t widely available in open source:

Usually when I’m thinking about cutting-edge speech recognition systems, I’m thinking of ‘real-time’ speech recognition: this means that while I’m talking the system is already doing some speech recognition. For end-to-end systems there isn’t some open source solution or open architecture that you can use to build a ‘real-time’ speech recognition system. … End-to-end, ‘real-time’ systems represent the future.

Yishay Carmiel will be speaking at the AI Conference in San Francisco (Sep 26-27). Use the discount code FriendsofBen18 to save 18% on your registration.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:

A video version of this conversation is available on our YouTube channel.
Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.
Got speech? These guidelines will help you get started building voice applications
Yishay Carmiel: “Commercial speech recognition systems in the age of big data and deep learning”
Alan Nichol: “Best practices for building conversational AI applications”
Matthew Honnibal: “Building open source developer tools for language applications”
Marco Ribeiro: “Testing Natural Language Models”

[Image by Aaron Washington from Pixabay.]

The Data Exchange Podcast: Yishay Carmiel on recent progress in speech technologies.

Share this:

Like this:

Discover more from The Data Exchange