Linear attention is (maybe) all you need (to understand Transformer optimization)

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra. Linear attention is (maybe) all you need (to understand Transformer optimization). In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.