Targeted Vaccine: Safety Alignment for Large Language Models Against Harmful Fine-Tuning via Layer-Wise Perturbation

Guozhi Liu, Weiwei Lin 0001, Qi Mu, Tiansheng Huang, Ruichao Mo, Yuren Tao, Li Shen 0008. Targeted Vaccine: Safety Alignment for Large Language Models Against Harmful Fine-Tuning via Layer-Wise Perturbation. IEEE Transactions on Information Forensics and Security, 20:10806-10817, 2025. [doi]

Abstract

Abstract is missing.