NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time

Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang 0006, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu 0003. NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time. In Lun-Wei Ku, Andre Martins, Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024. pages 7913-7926, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.