Do-Not-Answer: Evaluating Safeguards in LLMs - researchr publication

researchr

You are not signed in
Sign in
Sign up

Yuxia Wang, Haonan Li 0002, Xudong Han, Preslav Nakov, Timothy Baldwin. Do-Not-Answer: Evaluating Safeguards in LLMs. In Yvette Graham, Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, St. Julian's, Malta, March 17-22, 2024. pages 896-911, Association for Computational Linguistics, 2024. [doi]

Abstract is missing.

runs on WebDSL