Philipp Moritz and Goku Mohandas on optimizing retrieval-augmented generation, hybrid routing, and open source LLMs.
Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.
Philipp Moritz (Co-founder and CTO) and Goku Mohandas (ML and Product Lead) of Anyscale1 do a deep dive into retrieval augmented generation (RAG) and large language models (LLMs). They recently wrote a must-read article that will serve as an essential guide to teams who want to build RAG applications in a principled manner:
Our conversation underscores the pivotal role of advanced techniques like RAG, utilizing LLMs to produce relevant, user-query based content. This is achieved by leveraging precise embeddings and optimized information retrieval algorithms. Chunking enhances efficiency by breaking down documents into manageable segments, aiding the scalable implementation of retrieval-augmented generation. Embeddings and information retrieval are critical for comparing and extracting relevant content efficiently from extensive corpora. Incorporating multiple LLMs ensures diverse and accurate user responses, while Hybrid Routing directs queries to the suitable LLM, and Real-time Processing guarantees responsive user interaction. The emphasis is on scalable, efficient algorithms and LLMs to provide optimized, high-quality content responsively, with adaptability for specific use cases.

Interview highlights – key sections from the video version:
- Overview of RAG, and how to optimize RAG applications
- How to evaluate different configurations of a RAG application
- What to prioritize: chunking strategies or embedding models?
- Information Retrieval and Ranking
- Real-time or near real-time updates
- RAG applications that leverage multiple LLMs
- Applications that combine RAG and Custom LLMs
- What exactly is an “open source” LLM?
- Ray Tune and optimizing RAG applications
Related content:
- A video version of this conversation is available on our YouTube channel.
- Anyscale blog post: Building RAG-based LLM Applications for Production
- Best Practices in Retrieval Augmented Generation
- Open Source Principles in Foundation Models
- Building a Fleet of Custom LLMs
- Ivy: Streamlining AI Model Deployment and Development
- Daniel Lenton: Ivy – The One-Stop Interface for AI Model Deployment and Development
- Michele Catasta: Software Development with AI and LLMs
- Brian Raymond: ETL for LLMs
- Jerry Liu: An Open Source Data Framework for LLMs
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
[1] Ben Lorica is an advisor to Anyscale and other startups.