Yuxia Wang, Haonan Li 0002, Xudong Han, Preslav Nakov, Timothy Baldwin. Do-Not-Answer: Evaluating Safeguards in LLMs. In Yvette Graham, Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, St. Julian's, Malta, March 17-22, 2024. pages 896-911, Association for Computational Linguistics, 2024. [doi]
Abstract is missing.