Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models

Zhuoran Jin, Pengfei Cao, Hongbang Yuan, Yubo Chen 0001, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu 0001, Jun Zhao 0001. Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models. In Lun-Wei Ku, Andre Martins, Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. pages 1193-1215, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.