Deploying Machine Learning Models Safely and Systematically

The Data Exchange Podcast: Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know.

SubscribeApple • Android • Spotify • Stitcher • Google • AntennaPodRSS.

This week’s guest is Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai. Prior to GitHub, Hamel worked on machine learning applications and systems at  Airbnb and DataRobot.

Download the 2021 NLP Survey Report and learn how companies are using and implementing natural language technologies.

Hamel Husain:

I saw a pull request (PR), where someone was making a change to a model. And the comment was  “Does this make the model better?” The reply was “Yeah, it makes them all better.”

I keep seeing this PR over and over again. This is the PR that haunts me, different versions of the same PR, and this comment: “You may have changed the model, but is it good?” A response could be: “Yeah,I think so”, or “Here’s a screenshot of a Jupyter Notebook, or here’s something else”. Those answers are really problematic to me. It is problematic to everybody else.

Basically, you have no tests, maybe you have certain tests, but you still have this lingering question: should this code really be merged? What changed? What is the outcome? But what is the result of this change? And that can be very ambiguous with machine learning. You shouldn’t have to guess if something will break. Why are we doing this?

Highlights in the video version:

Related content:

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

[Photo by Paul Skorupskas on Unsplash.]