Exhaustion of High-Quality Data Could Slow Down AI Progress in Coming Decades

Pablo Villalobos on lack of data and other bottlenecks for scaling machine learning models.

SubscribeApple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.

Pablo Villalobos is a Staff Researcher at  Epoch, and lead author of the recent paper “Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”.  We discuss the key findings in this paper, as well as a related study Pablo conducted on scaling laws.  The term “scaling laws” pertains to the correlations between the functional aspects of interest – typically, the test loss or the performance metric for fine-tuning tasks – and the architecture or optimization process characteristics such as model size, width, or training compute. Leveraging these laws can aid in the creation and training of deep learning models, while also providing valuable insights into the underlying principles.

Subscribe to the Gradient Flow Newsletter


Interview highlights – key sections from the video version:


Related content:

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

[Image: Livestreamers, generated with Stable Diffusion.]