Jiwoo Hong and Noah Lee on Streamlining Language Model Training with Odds Ratio Preference Optimization.
Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.
Jiwoo Hong and Noah Lee of KAIST AI are co-authors of ORPO: Monolithic Preference Optimization without Reference Model. ORPO utilizes the odds ratio to learn preferences during fine-tuning, requiring significantly smaller datasets compared to traditional methods like RLHF and DPO. The method has garnered interest from the research community and industry players due to its efficiency, scalability, and potential to mitigate bias in language models.
Interview highlights – key sections from the video version:
- ORPO (Odds Ratio Preference Optimization) and how it combines supervised fine-tuning and preference alignment
- The Odds Ratio
- ORPO’s Objective Function and Dataset Size
- Dataset Size Comparison with RLHF
- ORPO’s Scalability and Model Size
- Data Requirements for Specific Tasks
- Comparison with Other Methods
- Single Dataset Approach and Preference Alignment
- The Nature of the ORPO Dataset
- ORPO’s Performance Compared to Traditional Methods
- ORPO’s Place in the AI Toolbox
- Evidence of ORPO’s Effectiveness
- ORPO and Bias Mitigation
- Adaptability
- Implementation of ORPO
- Community and Industry Reaction to ORPO
- Creating ORPO Datasets
- ORPO’s Efficiency and Future Directions
Related content:
- A video version of this conversation is available on our YouTube channel.
- Notebook: Fine-tune Llama 3 with ORPO
- From Supervised Fine-Tuning to Online Feedback
- Customizing LLMs: When to Choose LoRA or Full Fine-Tuning
- Ken Liu → Machine Unlearning: Techniques, Challenges, and Future Directions
- Scaling Dictionary Learning for Safer AI Models
- The Art of Forgetting: Demystifying Unlearning in AI Models
- Is Your Data Strategy Ready for Generative AI?
- Nestor Maslej → 2024 Artificial Intelligence Index
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter: