Deploying Machine Learning Models Safely and Systematically

The Data Exchange Podcast: Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know.

Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.

This week’s guest is Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai. Prior to GitHub, Hamel worked on machine learning applications and systems at Airbnb and DataRobot.

Download the 2021 NLP Survey Report and learn how companies are using and implementing natural language technologies.

Hamel Husain:

I saw a pull request (PR), where someone was making a change to a model. And the comment was “Does this make the model better?” The reply was “Yeah, it makes them all better.”

I keep seeing this PR over and over again. This is the PR that haunts me, different versions of the same PR, and this comment: “You may have changed the model, but is it good?” A response could be: “Yeah,I think so”, or “Here’s a screenshot of a Jupyter Notebook, or here’s something else”. Those answers are really problematic to me. It is problematic to everybody else.

Basically, you have no tests, maybe you have certain tests, but you still have this lingering question: should this code really be merged? What changed? What is the outcome? But what is the result of this change? And that can be very ambiguous with machine learning. You shouldn’t have to guess if something will break. Why are we doing this?

Highlights in the video version:

State of Processes/Tools for Model Validation & Deployment

CI/CD and machine learning

If MLOps tools are still lagging, why are people rushing to use ML in production?

A real-world example of model retraining at GitHub

We can’t expect all data scientists to become software engineers

ML and Data Science tools Hamel has been using recently

MLOps

Low-code and No-code in data and ML

ML at GitHub

How Hamel uses GitHub Co-pilot

Related content:

A video version of this conversation is available on our YouTube channel.
“Model Monitoring Enables Robust Machine Learning Applications”
“Data Quality Unpacked”
“Why you should build your AI Applications with Ray”
Charles Martin; “An oscilloscope for deep learning”
Sean Taylor: “Changes to the data science role and to data science tools”
Rumman Chowdury: “The State of Responsible AI”
Chris White: “Towards a next-generation dataflow orchestration and automation system”

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

[Photo by Paul Skorupskas on Unsplash.]

The Data Exchange Podcast: Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know.

Share this:

Like this:

Discover more from The Data Exchange