End-to-end deep learning models for speech applications

The Data Exchange Podcast: Yishay Carmiel on recent progress in speech technologies.

SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

In this episode of the Data Exchange I speak with Yishay Carmiel, an AI Leader at Avaya, a company focused on digital communications.  He has long been immersed in speech technologies and conversational applications and I have frequently used him as a resource to understand the latest in speech systems.  We previously co-wrote an article that listed out recommendations for teams building speech applications. We also had a previous conversation on the impact of deep learning and big data on speech technologies.

Are you using AI Responsibly? Join us December 15, 2020 for a series of short talks on Responsible AI—it’s free, and you can join the livestream or access the sessions on-demand.

We focused on recent developments in speech technologies. We recorded this podcast right after the NLP Summit where one of the keynotes was presented by Bo Li, a noted researcher from Google. In his keynote, Bo gave an overview of end-to-end speech models for ASR (automatic speech recognition). An end-to-end model incorporates functions from traditionally disparate components and puts them into a single neural network and optimizes them jointly.

[Image: Bo Li keynoting at the 2020 NLP Summit.]

Given that he works with both researchers and application builders, I devoted this episode to understanding Yishay’s perspective on the rise of end-to-end ASR models, text-to-speech systems, and responsible AI in the context of speech technologies. According to Yishay, for now real-time, end-to-end deep learning models aren’t widely available in open source:

    Usually when I’m thinking about cutting-edge speech recognition systems, I’m thinking of ‘real-time’ speech recognition: this means that while I’m talking the system is already doing some speech recognition. For end-to-end systems there isn’t some open source solution or open architecture that you can use to build a ‘real-time’ speech recognition system. … End-to-end, ‘real-time’ systems represent the future.

Yishay Carmiel will be speaking at the AI Conference in San Francisco (Sep 26-27). Use the discount code FriendsofBen18 to save 18% on your registration.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:

Register to join live or watch on-demand.

[Image by Aaron Washington from Pixabay.]