Exhaustion of High-Quality Data Could Slow Down AI Progress in Coming Decades

Pablo Villalobos on lack of data and other bottlenecks for scaling machine learning models.

Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.

Pablo Villalobos is a Staff Researcher at Epoch, and lead author of the recent paper “Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”. We discuss the key findings in this paper, as well as a related study Pablo conducted on scaling laws. The term “scaling laws” pertains to the correlations between the functional aspects of interest – typically, the test loss or the performance metric for fine-tuning tasks – and the architecture or optimization process characteristics such as model size, width, or training compute. Leveraging these laws can aid in the creation and training of deep learning models, while also providing valuable insights into the underlying principles.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Related content:

A video version of this conversation is available on our YouTube channel.
Neil Thompson: The Computational Limits of Deep Learning
Jinsung Yoon and Sercan Arik: Generating high-fidelity and privacy-preserving synthetic data
FREE Report: 2023 Trends in Data, Machine Learning, and AI
Yashar Behzadi: Synthetic data technologies can enable more capable and ethical AI
Gabriela Zanfir-Fortuna and Andrew Burt: Preparing for the Implementation of the EU AI Act and Other AI Regulations
Peter Norvig and Alfred Spector: Data Science and AI in Context
Jian Pei: Pricing Data Products

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

[Image: Livestreamers, generated with Stable Diffusion.]

Pablo Villalobos on lack of data and other bottlenecks for scaling machine learning models.

Share this:

Like this:

Discover more from The Data Exchange