What I Read: reinforcement learning

Posted on 2025-06-12 :: Tags: machine learning, reinforcement learning, large language model, reward, regularization, overfitting, policy, optimization

https://www.let-all.com/blog/2025/03/05/the-interface-between-reinforcement-learning-theory-and-language-model-post-training/
The Interface Between Reinforcement Learning Theory and Language Model Post-Training
Akshay Krishnamurthy, Audrey Huang
March 5, 2025
"Even though existing RLHF methods... employ KL-regularization to prevent deviating from the data collection policy \pi_{\mathrm{ref}}, the fact that these methods overfit suggests that they are not adequately regularized...."