Jailbreak Attack Initializations as Extractors of Compliance Directions

Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin. Jailbreak Attack Initializations as Extractors of Compliance Directions. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 6672-6705, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.