2024-01 ReFT: Reasoning with Reinforced Fine-Tuning

https://arxiv.org/abs/2401.08967

ReFT, LLM, RL, SFT, 추론, ByteDance, 2024