Building open source developer tools for language applications

The Data Exchange Podcast: Matthew Honnibal on spaCy, Thinc, Prodigy and natural language technologies.

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.

In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning).

Scalable machine learning, scalable Python, for everyone: Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, Raluca Popa and many other speakers at the first Ray Summit, a FREE virtual conference which takes place Sep 30th and Oct 1st.

Since it’s initial release in early 2015, spaCy has become one of the most popular NLP libraries. The introduction of transformers have made NLP and language applications high-priority tools for developers to learn and use. With Python emerging as the dominant programming language in machine learning, spaCy is well-positioned as an extremely fast NLP library ready-made for production applications.

Recent natural language models are enormous – GPT-3 from OpenAI is “an autoregressive language model with 175 billion parameters”. Tools like spaCy and Hugging Face make many cutting edge language models easily available to developers. RISELab Professor Ion Stoica has noted that distributed computing will be an important tool for machine learning developers. So it’s not surprising to see the open source project Ray playing a role in helping spaCy scale to large clusters.

Our conversation focused on a range of topics including:

spaCy
Thinc
Explosion AI and Prodigy
Distributed computing with Ray

You can view a video version of this conversation on our YouTube channel.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:

David Talby: “Building domain specific natural language applications”
Evan Sparks: “An open source platform for training deep learning models”
Dafna Shahaf: “Computational humanness, analogy and innovation, and soft concepts”
Robert Munro: “Human-in-the-loop machine learning”
Solmaz Shahalizadeh: “Business at the speed of AI: Lessons from Shopify”
Chris Nicholson: “Next-generation simulation software will incorporate deep reinforcement learning”

[Image by Deedee86 from Pixabay]

Building open source developer tools for language applications

The Data Exchange Podcast: Matthew Honnibal on spaCy, Thinc, Prodigy and natural language technologies.

Like this:

5 Comments

The Data Exchange Podcast: Matthew Honnibal on spaCy, Thinc, Prodigy and natural language technologies.

Share this:

Like this:

5 Comments

Discover more from The Data Exchange