Building open source developer tools for language applications

The Data Exchange Podcast: Matthew Honnibal on spaCy, Thinc, Prodigy and natural language technologies.


SubscribeiTunesAndroidSpotifyStitcherGoogle, and RSS.

In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning).

Enjoy a great series of FREE monthly virtual conferences from the organizers of the Ray Summit, the next one is scheduled for July 8th: Practical Reinforcement Learning. Go to anyscale.com/events to register.

Since it’s initial release in early 2015, spaCy has become one of the most popular NLP libraries.  The introduction of transformers have made NLP and language applications high-priority tools for developers to learn and use. With Python emerging as the dominant programming language in machine learning, spaCy is well-positioned as an extremely fast NLP library ready-made for production applications.

Recent natural language models are enormous –  GPT-3 from OpenAI is “an autoregressive language model with 175 billion parameters”.  Tools like spaCy and Hugging Face make many cutting edge language models easily available to developers. RISELab Professor Ion Stoica has noted that distributed computing will be an important tool for machine learning developers.  So it’s not surprising to see the open source project Ray playing a role in helping spaCy scale to large clusters.

Our conversation focused on a range of topics including:

 
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:


[Image by Deedee86 from Pixabay]