Integrating Fine-tuning and Preference Alignment in a Single Streamlined Process

Jiwoo Hong and Noah Lee on Streamlining Language Model Training with Odds Ratio Preference Optimization.

Subscribe: AppleSpotify OvercastPocket CastsAntennaPodPodcast AddictAmazon •  RSS.

Jiwoo Hong and  Noah Lee of KAIST AI are co-authors of ORPO: Monolithic Preference Optimization without Reference Model. ORPO utilizes the odds ratio to learn preferences during fine-tuning, requiring significantly smaller datasets compared to traditional methods like RLHF and DPO. The method has garnered interest from the research community and industry players due to its efficiency, scalability, and potential to mitigate bias in language models.

Subscribe to the Gradient Flow Newsletter

Interview highlights – key sections from the video version:

 

Related content:


If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter: