The Data Exchange Podcast: Charles Martin on how ideas from physics can be used to build practical tools for evaluating and tuning neural networks.
This week’s guest is Charles Martin, independent researcher and founder of Calculation Consulting, a boutique consultancy focused on data science and machine learning. Along with Michael Mahoney and Serena Peng, Charles is co-author of a recent Nature paper on new methods for evaluating and tuning deep learning models (“Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data”).
In a popular episode earlier this year, Mike Mahoney described some of the work covered in the Nature paper listed above. In this episode, Charles provides an update on refinements to their methodologies and to WeightWatcher, their open-source, diagnostic tool for analyzing deep neural networks. Some of the topics we covered included:
- How ideas from statistical mechanics can be used to measure the quality of a given machine learning model.
- Why it is of great practical interest to have metrics for gauging the quality of a trained model — that work even in the absence of training/testing data and without any detailed knowledge of the training/testing process.
- How their tools can be used to tune models – this is a real problem in areas like NLP, as our upcoming 2021 NLP Industry Survey will confirm.
- The potential of inserting their tools and methods in many parts of ML teams’ workflow: model QA/validation, or even early stage model filtering.
I think we’re just scratching the surface of what it takes to make AI an engineering discipline. This means that you understand what makes something reliable, what makes it robust? How do you evaluate it? You can’t do everything by brute force, there has to be more clever things you can do. You can’t test a bridge by building it and just driving cars over and see if they fail. This is what we used to do – we built bridges, we put them up, the wind would blow and they would fall down.
We’re just getting to the point now where we are asking questions like: “Can I predict the test accuracy of a model without looking at test data?” That’s a critical thing. You can’t always do cross-validation. Cross-validation can’t really give you out of sample performance. Because you’re still using your training data to evaluate it. Our tools allow us to predict the test accuracy without looking at either the data, or certainly without looking at test data.
Download a complete transcript of this episode by filling out the form below:
- A video version of this conversation is available on our YouTube channel.
- Model Monitoring Enables Robust Machine Learning Applications
- Michael Mahoney: “Tools for building robust, state-of-the-art machine learning models”
- Neil Thompson: “The Computational Limits of Deep Learning”
- Steven Feng and Eduard Hovy: “Data Augmentation in Natural Language Processing”
- Connor Leahy: “Training and Sharing Large Language Models”
- Marco Ribeiro: “Testing Natural Language Models”
- Rumman Chowdury: “Responsible AI meets Reality”
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
[Image: Old Control Panel by Sergey Svechnikov on Unsplash.]