Cost-Optimal Grouped-Query Attention for Long-Context Modeling

Yingfa Chen, Yutong Wu, Chenyang Song, Zhen Leng Thai, Xingyu Shen, Xu Han 0007, Zhiyuan Liu 0001, Maosong Sun 0001. Cost-Optimal Grouped-Query Attention for Long-Context Modeling. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 5360-5376, Association for Computational Linguistics, 2025. [doi]

Authors

Yingfa Chen

This author has not been identified. Look up 'Yingfa Chen' in Google

Yutong Wu

This author has not been identified. Look up 'Yutong Wu' in Google

Chenyang Song

This author has not been identified. Look up 'Chenyang Song' in Google

Zhen Leng Thai

This author has not been identified. Look up 'Zhen Leng Thai' in Google

Xingyu Shen

This author has not been identified. Look up 'Xingyu Shen' in Google

Xu Han 0007

This author has not been identified. Look up 'Xu Han 0007' in Google

Zhiyuan Liu 0001

This author has not been identified. Look up 'Zhiyuan Liu 0001' in Google

Maosong Sun 0001

This author has not been identified. Look up 'Maosong Sun 0001' in Google