Defending LLMs against jailbreak attacks through representation offset detection

Shuo Liu, Xiang Cheng 0003, ZhenZhong Zheng, Sen Su. Defending LLMs against jailbreak attacks through representation offset detection. Inf. Process. Manage., 63(5):104662, 2026. [doi]

Abstract

Abstract is missing.