Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model

Abhijith Chintam, Rahel Beloch, Willem H. Zuidema, Michael Hanna 0001, Oskar van der Wal. Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model. In Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, Hosein Mohebbi, editors, Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2023, Singapore, December 7, 2023. pages 379-394, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.