TAG: 선호학습

2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model	2024/02/07 09:50	Hyunsoo Park
2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback	2024/01/11 00:20	Hyunsoo Park
2024-01 ARGS: Alignment as Reward-Guided Search	2024/02/10 13:47	Hyunsoo Park
2024-01 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation	2024/01/23 03:24	Hyunsoo Park
2024-01 WARM: On the Benefits of Weight Averaged Reward Models	2024/01/23 14:28	Hyunsoo Park