The Data Exchange Podcast: Matthew Honnibal on spaCy, Thinc, Prodigy and natural language technologies.
In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning).
Since it’s initial release in early 2015, spaCy has become one of the most popular NLP libraries. The introduction of transformers have made NLP and language applications high-priority tools for developers to learn and use. With Python emerging as the dominant programming language in machine learning, spaCy is well-positioned as an extremely fast NLP library ready-made for production applications.
Recent natural language models are enormous – GPT-3 from OpenAI is “an autoregressive language model with 175 billion parameters”. Tools like spaCy and Hugging Face make many cutting edge language models easily available to developers. RISELab Professor Ion Stoica has noted that distributed computing will be an important tool for machine learning developers. So it’s not surprising to see the open source project Ray playing a role in helping spaCy scale to large clusters.
Our conversation focused on a range of topics including:
- Explosion AI and Prodigy
- Distributed computing with Ray
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
- David Talby: “Building domain specific natural language applications”
- Evan Sparks: “An open source platform for training deep learning models”
- Dafna Shahaf: “Computational humanness, analogy and innovation, and soft concepts”
- Robert Munro: “Human-in-the-loop machine learning”
- Solmaz Shahalizadeh: “Business at the speed of AI: Lessons from Shopify”
- Chris Nicholson: “Next-generation simulation software will incorporate deep reinforcement learning”