Unleashing the power of large language models

Maarten Grootendorst on applying large language models to topic models and fuzzy string matching.

Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.

Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction). These projects bring the power of transformers and other leading edge models, and package them with simple APIs, clear documentation, and visualization tools. I highly recommend these libraries!

Maarten Grootendorst will be giving a keynote at the upcoming NLP Summit, a FREE virtual conference that takes place October 4-6.

Highlights in the video version:

Inspiration for BERTopic

Transformers provide accurate models

Topic modeling

BERTopic: main priorities and key metrics

Interesting use cases

Polyfuzz

Documentation and pretrained models

What else have you been paying attention to outside of NLP?