Percy Liang on new tools for the Holistic Evaluation of Language Models.
Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.
Percy Liang is Associate Professor of Computer Science and Statistics, and Director of the new Center for Research on Foundation Models at Stanford University. We discussed a new suit of tools (HELM) designed to help users and researchers understand language models in their totality. We also discuss recent trends in AI including the rise of Generative AI and Foundation Models.
If there’s only API access, that’s not enough. It depends on how technical and how much a company wants to invest, and how much data they have, and whether they are comfortable shipping data to an API and or to someone else. I don’t think there will be just one GPT-like model that rules them all. It will come down to the dynamics of how organizations are structured, and considerations like trust, cost, and other things.
– Percy Liang on the likely rise of decentralized custom models.
Interview highlights – key sections from the video version:
- What are language models?
- What is HELM (Holistic Evaluation of Language Models)?
- Using HELM
- Metrics they plan to add to HELM
- The impact of Model Size – key findings from HELM
- “Private” vs “Public” models
- Are we going to run out of data?
- Fine tuning and pre-training
- Foundation Models
- HELM roadmap and schedule
- The Center for Research on Foundation Models at Stanford University

Related content:
- A video version of this conversation is available on our YouTube channel.
- Holistic Evaluation of Language Models
- Roy Schwartz: Efficient Methods for Natural Language Processing
- Barret Zoph and Liam Fedus: Efficient Scaling of Language Models
- Connor Leahy and Yoav Shoham: Large Language Models
- Foundation Models: A Primer for Investors and Builders
- Machine Learning Trends You Need To Know
- Mark Chen of OpenAI: How DALL·E works
- Jack Clark: The 2022 AI Index
- Piotr Żelasko: The Unreasonable Effectiveness of Speech Data
- fastdup: Introducing a new free tool for curating image datasets at scale
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
[Image: Evaluating a Language Model, by Ben Lorica, with images generated using DALL-E 2.]