Unleashing the power of large language models

Maarten Grootendorst on applying large language models to topic models and fuzzy string matching.

Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction). These projects bring the power of transformers and other leading edge models, and package them with simple APIs, clear documentation, and visualization tools. I highly recommend these libraries!

Highlights in the video version:

Figure: Keywords (extracted with KeyBERT) from recent “NLP” related job postings. Limited to a few key U.S. technology hubs; July/2022 through early August/2022.

