The gains do not make up for the losses: a comprehensive evaluation for safety alignment of large language models via machine unlearning

Weixiang Zhao, Yulin Hu, Xingyu Sui, Zhuojun Li, Yang Deng 0002, Yanyan Zhao, Bing Qin 0001, Wanxiang Che. The gains do not make up for the losses: a comprehensive evaluation for safety alignment of large language models via machine unlearning. Frontiers of Computer Science in China, 20(2):2002319, February 2026. [doi]

Abstract

Abstract is missing.