• 내용으로 건너뛰기

Out of the Box

사용자 도구

  • 로그인

사이트 도구

  • 최근 바뀜
  • 미디어 관리자
  • 사이트맵
추적: • cloud • 2020-05_learning_simulate_dynamic_environments_gamegan • 2023-08_maintaining_plasticity_in_continual_learning_via_regenerative_regularization • interest • time • evolutionary_population_curriculum_for_scaling_multi-agent_reinforcement_learning • 2023-12_efficient_large_language_models_a_survey • extream_learning_machine • action_space • llm_fine_tuning

topic:llm_fine_tuning

문서의 이전 판입니다!


목차

  • LLM Fine-Tuning
      • PEFT
      • RLHF

LLM Fine-Tuning

  • Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
  • Fine Tuning LLMs on a Single Consumer Graphic Card
  • Phinetuning 2.0
  • Code LoRA from Scratch
  • PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

PEFT

  • 2024-10 Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
  • 2024-03 Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
  • 2024-03 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
  • 2023-12 Batched Low-Rank Adaptation of Foundation Models

RLHF

  • 2024-01 WARM: On the Benefits of Weight Averaged Reward Models
  • 2024-01 Secrets of RLHF in Large Language Models Part II: Reward Modeling
  • 2024-01 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
  • 2024-01 [SPO] A Minimaximalist Approach to Reinforcement Learning from Human Feedback
  • 2024-01 ARGS: Alignment as Reward-Guided Search
  • 2023-12 [DPO] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • 2023-10 Vanishing Gradients in Reinforcement Finetuning of Language Models
  • 2023-10 [IPO] A General Theoretical Paradigm to Understand Learning from Human Preferences
  • 2023-06 Secrets of RLHF in Large Language Models Part I: PPO
/var/www/html/data/pages/topic/llm_fine_tuning.txt · 마지막으로 수정됨: 2024/03/23 02:42 저자 127.0.0.1

문서 도구

  • 원본 보기
  • 이전 판
  • 역링크
  • Fold/unfold all
  • 맨 위로
별도로 명시하지 않을 경우, 이 위키의 내용은 다음 라이선스에 따라 사용할 수 있습니다: CC Attribution-Noncommercial-Share Alike 4.0 International
CC Attribution-Noncommercial-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki