Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models

Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong. Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa 0001, Barbara Di Eugenio, Steven Schockaert, editors, Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025. pages 4535-4547, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.