Improving the robustness of natural language applications

The Data Exchange Podcast: Jack Morris on adversarial attacks, data augmentation, and adversarial training in NLP.

SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

In this episode of the Data Exchange I speak with Jack Morris, a member of Google’s AI Residency program. He is co-creator of TextAttack, an open source framework for adversarial attacks, data augmentation, and adversarial training in NLP (paper, code).

Are you using AI Responsibly? Join us December 15, 2020 for a series of short talks on Responsible AI—it’s free, and you can join the livestream or access the sessions on-demand.

Adversarial examples are inputs used to fool a machine learning model. In recent years adversarial attacks against computer vision models have been covered in numerous media articles. Similar attacks have surfaced for NLP models and there have been a series of research projects dedicated to generating adversarial examples and defending against these adversaries. In fact, adversarial attacks against language applications is an active research are, here are some recent examples of attacks against language models:

So how exactly does one mount an attack against a language model. In computer vision one can attempt to fool a model by manipulating a few pixels or frames. While harder to mount, Jack described some of the general ways one might attack a language model:

    You brought up words and letters. … There are two branches for those types of attacks: the first tries to find word & phrase replacements that make sense in context (so-called preservation of semantics), the second uses character level changes. It turns out that for shorter input that changing a few characters in a few words is enough to make many of the state-of-the-art NLP models confused. This is an issue for chatbots.

TextAttack unifies adversarial attack methods into a single framework. Its creators decompose NLP attacks into a goal function, a set of constraints, a transformation, and a search method.

Main features of TextAttack.

Developers can use TextAttack to test attacks on models and datasets. It comes with dozens of pretrained models, integrated with Hugging Face Transformers, and supports many tasks including summarization, machine translation, and all nine tasks from the GLUE benchmark.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content and resources:

Register to join live or watch on-demand.

[Image by Ben Lorica, from original artwork by John Patrick McKenzie.]