Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Di Wu, Xin Lu, Yanyan Zhao, Bing Qin 0001. Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025. pages 1210-1225, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.