Site icon The Data Exchange

Data Augmentation in Natural Language Processing

Photo by Ludomił Sawicki on Unsplash

The Data Exchange Podcast: Ed Hovy and Steven Feng on current challenges and future directions for research in data augmentation and in natural language models.


SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

This week’s guests are Steven Feng, Graduate Student and Ed  Hovy, Research Professor, both from the Language Technologies Institute of Carnegie Mellon University. We discussed their recent survey paper on Data Augmentation Approaches in NLP (GitHub), an active field of research on techniques for increasing the diversity of training examples without explicitly collecting new data. One key reason why such strategies are important is that augmented data can act as a regularizer to reduce overfitting when training models.

Take the 2021 NLP Industry Survey and get a free pass to the 2021 NLP Summit.

We discussed current challenges and future directions for research in data augmentation for NLP. I also took the opportunity to discuss broad trends in NLP and Language Technologies with Steven and Ed, including:

Download a complete transcript of this episode by filling out the form below:

Related content:

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.


2021 NLP Survey

The 2021 NLP Industry Survey is now open and we need your help. The survey takes less than 5 minutes to fill out and in exchange we’ll send you a copy of the survey results + a FREE pass to the 2021 NLP Summit (a virtual conference slated for October).


[Photo by Ludomił Sawicki on Unsplash.]

Exit mobile version