The Unreasonable Effectiveness of Speech Data

Piotr Żelasko on OpenAI’s Whisper and other trends in speech recognition technologies.

Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.

Piotr Żelasko is Head of Research at Meaning¹, a startup building an AI platform using speech technologies. He has years of experience in speech technologies, both as a researcher and as a software engineer. Along with other members of Meaning’s research team, we recently published an article describing new open source tools for unlocking speech and audio data. We recorded this episode on the week of the release of Whisper, a deep learning model (from OpenAI) that approaches human level robustness and accuracy on English speech recognition. Our conversation centered on Whisper and speech recognition, but also touched on the new speech data processing tools (Lhotse, k2, Icefall) that we described in our post.

Subscribe to the Gradient Flow Newsletter.

Highlights in the video version:

Introduction to Piotr Żelasko

Whisper, speech recognition systems, and creating uniform datasets

Translation and Whisper

Whisper benchmarks, training, and zero shot learning

Will we have more intelligent ASR systems?

What is the role of a tool like Lhotse?

How disruptive is Whisper to ASR?

Speech industry reaction to Whisper

Eliminating ambient noise and managing massive amounts of data

Related content: