2024-01 ReFT: Reasoning with Reinforced Fine-Tuning
https://arxiv.org/abs/2401.08967
ReFT
,
LLM
,
RL
,
SFT
,
추론
,
ByteDance
,
2024