Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Ta-Chung Chi, Ting-Han Fan, Li-Wei Chen, Alexander Rudnicky, Peter J. Ramadge. Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings. In Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. pages 1183-1193, Association for Computational Linguistics, 2023. [doi]

Authors

Ta-Chung Chi

This author has not been identified. Look up 'Ta-Chung Chi' in Google

Ting-Han Fan

This author has not been identified. Look up 'Ting-Han Fan' in Google

Li-Wei Chen

This author has not been identified. Look up 'Li-Wei Chen' in Google

Alexander Rudnicky

This author has not been identified. Look up 'Alexander Rudnicky' in Google

Peter J. Ramadge

This author has not been identified. Look up 'Peter J. Ramadge' in Google