topic:rlhf
RLHF, 선호학습
- 2024-01 WARM: On the Benefits of Weight Averaged Reward Models
- 2024-01 Secrets of RLHF in Large Language Models Part II: Reward Modeling
- 2024-01 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
- 2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback
- 2024-01 ARGS: Alignment as Reward-Guided Search
- 2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- 2023-10 Vanishing Gradients in Reinforcement Finetuning of Language Models
- 2023-10 [IPO] A General Theoretical Paradigm to Understand Learning from Human Preferences
- 2023-06 Secrets of RLHF in Large Language Models Part I: PPO
topic/rlhf.txt · 마지막으로 수정됨: 2024/03/23 02:42 저자 127.0.0.1