Building domain specific natural language applications

The Data Exchange Podcast: David Talby on Spark NLP and turning NLP research into enterprise solutions.

SubscribeiTunesAndroidSpotifyStitcherGoogle, and RSS.

In this episode of the Data Exchange I speak with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications. Besides their work on Spark NLP, David and his collaborators are building natural language models tuned specifically for healthcare applications.

Ray Summit has been postponed until the Fall. In the meantime, enjoy an amazing series of virtual conferences beginning in mid May on the theme “Scalable machine learning, scalable Python, for everyone”. Go to for details.

Our conversation spanned many topics, including:

  • Spark NLP: its current status and some common and surprising use cases.
  • Recent developments in NLP research and their implications for companies.
  • Spark NLP for Healthcare

Our goal in this podcast is to build a community of people interested in Data, Machine Learning and AI. If you have suggestions for us on what to recommend (books, conferences, links), and guests to book, please visit site and fill out the “contact” form.

Related content:

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

[Photo from Free Stock A woman sitting in a library.]