내용으로 건너뛰기

Out of the Box

사용자 도구

로그인

사이트 도구

최근 바뀜
미디어 관리자
사이트맵

추적: • 2024-03_gemma_open_models_based_on_gemini_research_and_technology • 2021-07_reasoning-modulated_representations • mo-v-mpo • autonomous_driving • 2024-03_dipaco_distributed_path_composition • qa_system • 2023-10_a_general_theoretical_paradigm_to_understand_learning_from_human_preferences • a_generalized_framework_for_population_based_training • 2021-01_addressing_some_limitations_of_transformers_with_feedback_memory • rlhf

topic:rlhf

문서의 이전 판입니다!

RLHF, 선호학습

2024-01 WARM: On the Benefits of Weight Averaged Reward Models
2024-01 Secrets of RLHF in Large Language Models Part II: Reward Modeling
2024-01 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback
2024-01 ARGS: Alignment as Reward-Guided Search
2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023-10 Vanishing Gradients in Reinforcement Finetuning of Language Models
2023-10 [IPO] A General Theoretical Paradigm to Understand Learning from Human Preferences
2023-06 Secrets of RLHF in Large Language Models Part I: PPO

/var/www/html/data/pages/topic/rlhf.txt · 마지막으로 수정됨: 2024/03/23 02:42 저자 127.0.0.1

문서 도구

원본 보기
이전 판
역링크
Fold/unfold all
맨 위로

별도로 명시하지 않을 경우, 이 위키의 내용은 다음 라이선스에 따라 사용할 수 있습니다: CC Attribution-Noncommercial-Share Alike 4.0 International