The following publications are possibly variants of this publication:
- Defending Large Language Models Against Jailbreaking Attacks Through Goal PrioritizationZhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, Minlie Huang. acl 2024: 8865-8887 [doi]
- Defending ChatGPT against jailbreak attack via self-remindersYueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie 0001, Fangzhao Wu. natmi, 5(12):1486-1496, December 2023. [doi]
- Unraveling the Mystery: Defending Against Jailbreak Attacks Via Unearthing Real IntentionYanhao Li, Hongshen Chen, Heng Zhang, Zhiwei Ge, Tianhao Li, Sulong Xu, Guibo Luo. COLING 2025: 8374-8384 [doi]
- SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware DecodingZhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia 0001, Bill Yuchen Lin, Radha Poovendran. acl 2024: 5587-5605 [doi]
- Defending Large Language Models Against Jailbreak Attacks Through Chain of Thought PromptingYanfei Cao, Naijie Gu, Xinyue Shen, Daiyuan Yang, Xingmin Zhang. nana 2024: 125-130 [doi]
- Defending Large Language Models Against Jailbreak Attacks via Layer-specific EditingWei Zhao, Zhe Li, Yige Li, Ye Zhang, Jun Sun 0001. emnlp 2024: 5094-5109 [doi]
- PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionZiyang Zhang, Qizhen Zhang, Jakob Nicolaus Foerster. icml 2024: [doi]