Yuxin Xiao, Chaoqun Wan, Yonggang Zhang 0003, Wenxiao Wang 0001, Binbin Lin, Xiaofei He 0001, Xu Shen 0001, Jieping Ye. Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, Cheng Zhang 0005, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. 2024. [doi]
Abstract is missing.