Layered Bias: Interpreting Bias in Pretrained Large Language Models

Nirmalendu Prakash, Roy Ka-Wei Lee. Layered Bias: Interpreting Bias in Pretrained Large Language Models. In Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, Hosein Mohebbi, editors, Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2023, Singapore, December 7, 2023. pages 284-295, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.