"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Edoardo Mosca, Shreyash Agarwal, Javier Rando-Ramirez, Georg Groh. "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks. In Smaranda Muresan, Preslav Nakov, Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. pages 7806-7816, Association for Computational Linguistics, 2022. [doi]

Authors

Edoardo Mosca

This author has not been identified. Look up 'Edoardo Mosca' in Google

Shreyash Agarwal

This author has not been identified. Look up 'Shreyash Agarwal' in Google

Javier Rando-Ramirez

This author has not been identified. Look up 'Javier Rando-Ramirez' in Google

Georg Groh

This author has not been identified. Look up 'Georg Groh' in Google