Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao. Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 2775-2786, Association for Computational Linguistics, 2023. [doi]
Abstract is missing.