Piotr Żelasko on OpenAI’s Whisper and other trends in speech recognition technologies.
Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
Piotr Żelasko is Head of Research at Meaning1, a startup building an AI platform using speech technologies. He has years of experience in speech technologies, both as a researcher and as a software engineer. Along with other members of Meaning’s research team, we recently published an article describing new open source tools for unlocking speech and audio data. We recorded this episode on the week of the release of Whisper, a deep learning model (from OpenAI) that approaches human level robustness and accuracy on English speech recognition. Our conversation centered on Whisper and speech recognition, but also touched on the new speech data processing tools (Lhotse, k2, Icefall) that we described in our post.
Highlights in the video version:
- Introduction to Piotr Żelasko
Whisper, speech recognition systems, and creating uniform datasets
Translation and Whisper
Whisper benchmarks, training, and zero shot learning
Will we have more intelligent ASR systems?
What is the role of a tool like Lhotse?
How disruptive is Whisper to ASR?
Speech industry reaction to Whisper
Eliminating ambient noise and managing massive amounts of data
Related content:
- A video version of this conversation is available on our YouTube channel.
- New open source tools to unlock speech and audio data
- Mark Chen: How DALL·E works
- Speech synthesis technologies will drive the next wave of innovative voice applications
- Nic Hohn and Max Pumperla: Reinforcement Learning in Real-World Applications
- A Guide to Data Annotation and Synthetic Data Generation Tools
- fastdup: Introducing a new free tool for curating image datasets at scale
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
[1] Ben Lorica is an advisor to Meaning and other startups.
[Image: Sound Files by Ben Lorica ; Images from Infogram.]