NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time

Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang 0006, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu 0003. NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time. In Lun-Wei Ku, Andre Martins, Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024. pages 7913-7926, Association for Computational Linguistics, 2024. [doi]

Authors

Yilong Chen

This author has not been identified. Look up 'Yilong Chen' in Google

Guoxia Wang

This author has not been identified. Look up 'Guoxia Wang' in Google

Junyuan Shang

This author has not been identified. Look up 'Junyuan Shang' in Google

Shiyao Cui

This author has not been identified. Look up 'Shiyao Cui' in Google

Zhenyu Zhang 0006

This author has not been identified. Look up 'Zhenyu Zhang 0006' in Google

Tingwen Liu

This author has not been identified. Look up 'Tingwen Liu' in Google

Shuohuan Wang

This author has not been identified. Look up 'Shuohuan Wang' in Google

Yu Sun

This author has not been identified. Look up 'Yu Sun' in Google

Dianhai Yu

This author has not been identified. Look up 'Dianhai Yu' in Google

Hua Wu 0003

This author has not been identified. Look up 'Hua Wu 0003' in Google