Site icon The Data Exchange

Versioning and MLOps for Generative AI

Yucheng Low on developing tools for enhanced versioning, reproducibility, and collaboration in machine learning and AI.

Subscribe: AppleSpotify OvercastGoogleAntennaPodPodcast AddictAmazon •  RSS.

Yucheng Low, Cofounder & CEO of  XetHub, discusses the challenges of managing large-scale machine learning assets and the need for version control. He highlights the importance of tracking changes and collaborating on data and models, and how XetHub’s platform addresses these challenges by providing a versioning system for models and data with collaboration capabilities. The platform supports various file types, including images and unstructured data, and has open-sourced its client surface area for easy integration. The conversation also touches on the challenges of data deletion and the importance of openness and not being locked into a single format.

Subscribe to the Gradient Flow Newsletter

 

Interview highlights – key sections from the video version:

  1. From Apple to Data Management: Making Data as Versionable as Code
  2. Designing Data Sharing Platforms: Addressing Structured Data, File Types, and GDPR Challenges
  3. File formats and interfaces
  4. Scalable Repository Management: Instant Access, Block-Level Deduplication, and Enhanced Data Visualization
  5. Storing Valuable Data for Future Use and Model Comparisons
  6. Collaborative Iteration and Evaluation of Versions in Machine Learning Deployment
  7. Some early XetHub case studies
  8. Advanced Data Repository with Versioning, Python API, and Streamlet Integration
  9. Cost benefits
  10. Available integration
  11. Collaboration features and implications for custom foundation models

 

Related content:


If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Exit mobile version