• 내용으로 건너뛰기

Out of the Box

사용자 도구

  • 로그인

사이트 도구

  • 최근 바뀜
  • 미디어 관리자
  • 사이트맵
추적: • 2024-03_gemma_open_models_based_on_gemini_research_and_technology • 2021-07_reasoning-modulated_representations • mo-v-mpo • autonomous_driving • 2024-03_dipaco_distributed_path_composition • qa_system • 2023-10_a_general_theoretical_paradigm_to_understand_learning_from_human_preferences • a_generalized_framework_for_population_based_training • 2021-01_addressing_some_limitations_of_transformers_with_feedback_memory • rlhf

topic:rlhf

문서의 이전 판입니다!


RLHF, 선호학습

  • 2024-01 WARM: On the Benefits of Weight Averaged Reward Models
  • 2024-01 Secrets of RLHF in Large Language Models Part II: Reward Modeling
  • 2024-01 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
  • 2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback
  • 2024-01 ARGS: Alignment as Reward-Guided Search
  • 2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • 2023-10 Vanishing Gradients in Reinforcement Finetuning of Language Models
  • 2023-10 [IPO] A General Theoretical Paradigm to Understand Learning from Human Preferences
  • 2023-06 Secrets of RLHF in Large Language Models Part I: PPO
/var/www/html/data/pages/topic/rlhf.txt · 마지막으로 수정됨: 2024/03/23 02:42 저자 127.0.0.1

문서 도구

  • 원본 보기
  • 이전 판
  • 역링크
  • Fold/unfold all
  • 맨 위로
별도로 명시하지 않을 경우, 이 위키의 내용은 다음 라이선스에 따라 사용할 수 있습니다: CC Attribution-Noncommercial-Share Alike 4.0 International
CC Attribution-Noncommercial-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki