Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang 0014, Aryaman Arora, Zhengxuan Wu, Noah D. Goodman, Christopher Potts, Thomas Icard. Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability. Journal of Machine Learning Research, 26, 2025. [doi]

Abstract

Abstract is missing.