Site icon The Data Exchange

Testing Natural Language Models

The Data Exchange Podcast: Marco Ribeiro on why accuracy on benchmarks is not sufficient for evaluating NLP models.


Subscribe: AppleAndroidSpotifyStitcherGoogleRSS.

In this episode of the Data Exchange I speak with Marco Ribeiro,  Senior Researcher at Microsoft Research, and lead author of the award-winning paper ”Beyond Accuracy: Behavioral Testing of NLP models with CheckList”.  As machine learning gains importance across many application domains and industries, there is a growing need to formalize how ML models get built, deployed, and used.  MLOps is an emerging set of practices focused on productionizing the machine learning lifecycle, that draws ideas from CI/CD. But even before we talk about deploying a model to production, how do we inject more rigor into the model development process?

Are you using AI Responsibly? Join us December 15, 2020 for a series of short talks on Responsible AI—it’s free, and you can join the livestream or access the sessions on-demand.

Marco and his collaborators address this question – at least in the context of natural language models – in their well-received paper. Recall that NLP model training tends to follow the following simple process: split your data into train-validation data sets. Build a model using your training subset, and test its efficacy using your validation set. As Marco and his collaborators point out:


CheckList is an open source project for testing your NLP models.  Behavioral or black-box testing is a longstanding testing methodology that focuses on validating input-output behavior of a software system.

CheckList provides tools for testing natural language models across many different capabilities including:

As machine learning and natural language models continue to grow in importance, companies need to inject more rigor into their model development, deployment, and monitoring processes. Borrowing from longstanding practices in software engineering, CheckList should be a welcome addition to the toolbox of any developer building natural language models.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:


Register
Register to join live or watch on-demand.


[Image by Okan Caliskan from Pixabay]

Exit mobile version