Chang She on the power of Lance, a columnar data format for AI/ML.
Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.
Chang She is CEO and co-founder of LanceDB, an open-source database designed for multimodal AI applications, offering scalable vector search, streaming training data, and interactive exploration of large AI datasets. In this episode we discuss Lance, an open-source columnar data format that tackles the unique challenges posed by modern AI and machine learning workloads. Specifically engineered for efficiency, Lance addresses limitations of existing formats like Parquet and ORC by optimizing the storage and retrieval of large, complex data types, including images, videos, and vector embeddings.
Interview highlights – key sections from the video version:
- Introduction to Lance and the Challenge of Unstructured Data
- Overcoming Limitations of Existing Formats (Parquet, ORC)
- Lance: A New Data Format for AI Workloads
- Efficient Metadata Handling and Wide Data Support in Lance
- Integrated Vector Indexing for AI Applications
- LanceDB: A Scalable Vector Database Built on Lance Format
- Real-World Use Cases: Images, Videos, and Large-Scale Datasets
- Lance as a “One-Stop Shop” for AI Data Lakes
- Comparison to Meta’s Nimble: Similarities and Differences
- Open Source Ecosystem and Community Contributions
- Key Use Cases: Data Exploration, Training, and Vector Search
- Addressing the Limitations of Traditional Vector Search Systems
- Exploratory Data Analysis for Unstructured Data with Lance
- Multimodal Embeddings and Vector Search
- Feature Stores and Their Evolving Role in AI
- Putting LanceDB’s Vector Search to the Test
- Embedding Pipelines, Ecosystem Integrations, and Deployment
- Open Source and Enterprise Offerings from LanceDB
- The Future of Lance: New Encodings, Integrations, and Governance
Related content:
- A video version of this conversation is available on our YouTube channel.
- Previous episodes pertaining to vector search can be found HERE.
- Choosing the Right Vector Search System
- Is Your Data Strategy Ready for Generative AI?
- Semih Salihoglu → The Intersection of LLMs, Knowledge Graphs, and Query Generation
- Philip Rathle → Supercharging AI with Graphs
- Ajay Kulkarni and Mike Freedman → Postgres: The Swiss Army Knife of Databases
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
