What I Read: Adversarial Attacks on LLMs

Posted on 2024-02-06 :: Tags: machine learning, natural language processing, large language model, loss, gradient, token, embedding, optimization, autoregression, adversarial

https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/
Adversarial Attacks on LLMs
Lilian Weng
October 25, 2023
"Adversarial attacks are inputs that trigger the model to output something undesired."