Unlocking the Power of LLMs with Data Prep Kit

Ben Lorica

2 years ago

Petros Zerfos and Hima Patel on Simplifying AI Data Pipelines with IBM’s Data Prep Kit.

Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.

Petros Zerfos and Hima Patel of IBM Research are part of the team behind Data Prep Kit, an open-source toolkit that helps process and prepare raw text and code data at scale for use in large language model applications. We explore Data Prep Kit’s robust capabilities in handling text, code, and documents, and discuss its scalability, cloud-native architecture, and future enhancements. We also touch on DPK’s integration with popular tools, including Ray, making it an essential resource for AI teams. [Ray Summit 2024 comes to San Francisco September 30-October 2. Use the code AnyscaleBen15 for a 15% discount when you register!]

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

Related content:

A video version of this conversation is available on our YouTube channel.
Inside the Data Strategies of Top AI Labs
Choosing the Right Vector Search System
Generative AI: Navigating the Challenges of Enterprise Adoption
Chang She → Unlocking the Power of Unstructured Data
Brian Raymond → ETL for LLMs
Jerry Liu → An Open Source Data Framework for LLMs

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Petros Zerfos and Hima Patel on Simplifying AI Data Pipelines with IBM’s Data Prep Kit.

Share this: