Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens

Stanislav Fort. Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.