The Unreasonable Effectiveness of Speech Data

Piotr Żelasko on OpenAI’s Whisper and other trends in speech recognition technologies.

SubscribeApple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.

Piotr Żelasko is Head of Research at Meaning1, a startup building an AI platform using speech technologies. He has years of experience in speech technologies, both as a researcher and as a software engineer. Along with other members of Meaning’s research team, we recently published an article describing new open source tools for unlocking speech and audio data. We recorded this episode on the week of the release of Whisper, a deep learning model (from OpenAI) that approaches human level robustness and accuracy on English speech recognition.  Our conversation centered on Whisper and speech recognition, but also touched on the new speech data processing tools (Lhotse, k2, Icefall) that we described in our post.

Subscribe to the Gradient Flow Newsletter.

Highlights in the video version:

Related content:

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

[1] Ben Lorica is an advisor to Meaning and other startups.

[Image: Sound Files by Ben Lorica ; Images from Infogram.]