Residual vector quantization for KV cache compression in large language model

Ankur Kumar. Residual vector quantization for KV cache compression in large language model. In Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen, editors, NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada. Volume 262 of Proceedings of Machine Learning Research, pages 485-490, PMLR, 2024. [doi]

Abstract

Abstract is missing.