What I Read: reward hacking.
https://lilianweng.github.io/posts/2024-11-28-reward-hacking/
Reward Hacking in Reinforcement Learning
Lilian Weng
November 28, 2024
“Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task.”