The Data Exchange Podcast: Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know.
Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
This week’s guest is Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai. Prior to GitHub, Hamel worked on machine learning applications and systems at Airbnb and DataRobot.
I saw a pull request (PR), where someone was making a change to a model. And the comment was “Does this make the model better?” The reply was “Yeah, it makes them all better.”
I keep seeing this PR over and over again. This is the PR that haunts me, different versions of the same PR, and this comment: “You may have changed the model, but is it good?” A response could be: “Yeah,I think so”, or “Here’s a screenshot of a Jupyter Notebook, or here’s something else”. Those answers are really problematic to me. It is problematic to everybody else.
Basically, you have no tests, maybe you have certain tests, but you still have this lingering question: should this code really be merged? What changed? What is the outcome? But what is the result of this change? And that can be very ambiguous with machine learning. You shouldn’t have to guess if something will break. Why are we doing this?
Highlights in the video version:
- State of Processes/Tools for Model Validation & Deployment
CI/CD and machine learning
If MLOps tools are still lagging, why are people rushing to use ML in production?
A real-world example of model retraining at GitHub
We can’t expect all data scientists to become software engineers
ML and Data Science tools Hamel has been using recently
Low-code and No-code in data and ML
ML at GitHub
How Hamel uses GitHub Co-pilot
- A video version of this conversation is available on our YouTube channel.
- “Model Monitoring Enables Robust Machine Learning Applications”
- “Data Quality Unpacked”
- “Why you should build your AI Applications with Ray”
- Charles Martin; “An oscilloscope for deep learning”
- Sean Taylor: “Changes to the data science role and to data science tools”
- Rumman Chowdury: “The State of Responsible AI”
- Chris White: “Towards a next-generation dataflow orchestration and automation system”
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
[Photo by Paul Skorupskas on Unsplash.]