Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Daniil Laptev, Nikita Balagansky, Yaroslav Aksenov, Daniil Gavrilov. Analyze Feature Flow to Enhance Interpretation and Steering in Language Models. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.