Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs With Minimal Human Interventions

Jingxin Xu, Guoshun Nan, Sheng Guan, Sicong Leng, Yilian Liu, Zixiao Wang, YuYang Ma, Zhili Zhou, Yanzhao Hou, Xiaofeng Tao. Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs With Minimal Human Interventions. IEEE Transactions on Information Forensics and Security, 21:1409-1423, 2026. [doi]

Abstract

Abstract is missing.