Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers

Tiberiu Musat. Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Authors

Tiberiu Musat

This author has not been identified. Look up 'Tiberiu Musat' in Google