Forgetting Transformer: Softmax Attention with a Forget Gate

Zhixuan Lin, Evgenii Nikishin, Xu Owen He, Aaron C. Courville. Forgetting Transformer: Softmax Attention with a Forget Gate. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.