Attack Prompt Generation for Red Teaming and Defending Large Language Models

Boyi Deng, Wenjie Wang, Fuli Feng, Yang Deng, Qifan Wang, Xiangnan He 0001. Attack Prompt Generation for Red Teaming and Defending Large Language Models. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 2176-2189, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.