Maarten Grootendorst on applying large language models to topic models and fuzzy string matching.
Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction). These projects bring the power of transformers and other leading edge models, and package them with simple APIs, clear documentation, and visualization tools. I highly recommend these libraries!
Highlights in the video version:
- Inspiration for BERTopic
Transformers provide accurate models
Topic modeling
BERTopic: main priorities and key metrics
Interesting use cases
Polyfuzz
Documentation and pretrained models
What else have you been paying attention to outside of NLP?

Related content:
- A video version of this conversation is available on our YouTube channel.
- David Blei: Topic Models – Past, Present, Future
- Connor Leahy and Yoav Shoham: Large Language Models
- Speech synthesis technologies will drive the next wave of innovative voice applications
- Resurgence of Conversational AI
- Jack Clark: The 2022 AI Index
- Hilary Mason: Narrative AI
- Leo Meyerovich: The Graph Intelligence Stack

[Image: Puzzles by Ben Lorica; original photos from Unsplash, via Infogram.]