No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks

Sergey Berezin, Reza Farahbakhsh, Noël Crespi. No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 2362-2369, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.