Hao Kang, Qingru Zhang, Souvik Kundu 0009, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao. GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference. In Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen, editors, NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada. Volume 262 of Proceedings of Machine Learning Research, pages 305-321, PMLR, 2024. [doi]
Abstract is missing.