Yucheng Low on developing tools for enhanced versioning, reproducibility, and collaboration in machine learning and AI.
Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.
Yucheng Low, Cofounder & CEO of XetHub, discusses the challenges of managing large-scale machine learning assets and the need for version control. He highlights the importance of tracking changes and collaborating on data and models, and how XetHub’s platform addresses these challenges by providing a versioning system for models and data with collaboration capabilities. The platform supports various file types, including images and unstructured data, and has open-sourced its client surface area for easy integration. The conversation also touches on the challenges of data deletion and the importance of openness and not being locked into a single format.
- ❛ MLOps is essentially DevOps, but at a grander scale. If we can scale DevOps to handle repositories of any size and if our Continuous Integration (CI) systems can efficiently manage tasks like GPU-based training, many issues naturally resolve themselves. This would eliminate the need for many tools we currently use. We’re seeing a convergence where machine learning teams are evolving to resemble microservice teams. Instead of teams working solely on one microservice, they work on or integrate with multiple ML model services sourced from various places. Much like microservices, an organization could have dozens or even hundreds of these model services. ❜
– Yucheng Low, Cofounder & CEO of XetHub
Interview highlights – key sections from the video version:
- From Apple to Data Management: Making Data as Versionable as Code
- Designing Data Sharing Platforms: Addressing Structured Data, File Types, and GDPR Challenges
- File formats and interfaces
- Scalable Repository Management: Instant Access, Block-Level Deduplication, and Enhanced Data Visualization
- Storing Valuable Data for Future Use and Model Comparisons
- Collaborative Iteration and Evaluation of Versions in Machine Learning Deployment
- Some early XetHub case studies
- Advanced Data Repository with Versioning, Python API, and Streamlet Integration
- Cost benefits
- Available integration
- Collaboration features and implications for custom foundation models
Related content:
- A video version of this conversation is available on our YouTube channel.
- Building a Fleet of Custom LLMs: 7 Key Features Every Team Needs
- The Future of Creativity: The Intersection of AI and Copyright
- The Financial Services Sector’s March into Generative AI
- Daniel Lenton: Ivy – The One-Stop Interface for AI Model Deployment and Development
- Michele Catasta: Software Development with AI and LLMs
- Brian Raymond: ETL for LLMs
- Jerry Liu: An Open Source Data Framework for LLMs
- Steve Hsu: Using LLMs to Build AI Co-pilots for Knowledge Workers
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter: