What I Read: Adversarial Attacks on LLMs.
https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/
Adversarial Attacks on LLMs
Lilian Weng
October 25, 2023
“Adversarial attacks are inputs that trigger the model to output something undesired.”