• 내용으로 건너뛰기

Out of the Box

사용자 도구

  • 로그인

사이트 도구

  • 최근 바뀜
  • 미디어 관리자
  • 사이트맵
추적: • 2021-02_first_return_then_explore • 2020-08_computer-generated_music_for_tabletop_role-playing_games • 2020-07_tabletop_roleplaying_games_procedural_content_generators • 2017-06_accurate_large_minibatch_sgd_training_imagenet_in_1_hour • 2019-12_larc • dokuwiki • 2024-06_smplolympics_sports_environments_for_physically_simulated_humanoids • big_bird_transformers_longer_sequences • multiagent_evaluation_under_incomplete_information • archive

rlhf

TAG: rlhf

  • 2023-06 Secrets of RLHF in Large Language Models Part I: PPO
2024/02/07 08:28Hyunsoo Park
  • 2023-10 [IPO] A General Theoretical Paradigm to Understand Learning from Human Preferences
2024/02/07 09:55Hyunsoo Park
  • 2023-10 Vanishing Gradients in Reinforcement Finetuning of Language Models
2024/02/02 05:52Hyunsoo Park
  • 2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2024/02/07 09:50Hyunsoo Park
  • 2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback
2024/01/11 00:20Hyunsoo Park
  • 2024-01 ARGS: Alignment as Reward-Guided Search
2024/02/10 13:47Hyunsoo Park
  • 2024-01 Secrets of RLHF in Large Language Models Part II: Reward Modeling
2024/02/07 08:30Hyunsoo Park
  • 2024-01 WARM: On the Benefits of Weight Averaged Reward Models
2024/01/23 14:28Hyunsoo Park

문서 도구

  • 문서 보기
  • 이전 판
  • 역링크
  • Fold/unfold all
  • 맨 위로
별도로 명시하지 않을 경우, 이 위키의 내용은 다음 라이선스에 따라 사용할 수 있습니다: CC Attribution-Noncommercial-Share Alike 4.0 International
CC Attribution-Noncommercial-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki