Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng 0001. HSF: Defending against Jailbreak Attacks with Hidden State Filtering. In Guodong Long, Michale Blumestein, Yi Chang 0001, Liane Lewin-Eytan, Zi Helen Huang, Elad Yom-Tov, editors, Companion Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025 - 2 May 2025. pages 2078-2087, ACM, 2025. [doi]
Abstract is missing.