2024-01 Self-Rewarding Language Models
https://arxiv.org/abs/2401.10020
https://github.com/lucidrains/self-rewarding-lm-pytorch
self-play learning
,
self-reward
,
self-learning
,
RL
,
LLM
,
2024
,
Meta