HCAttention: Extreme KV cache compression via heterogeneous attention computing for LLMs

Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao 0003. HCAttention: Extreme KV cache compression via heterogeneous attention computing for LLMs. Neurocomputing, 697:134247, 2026. [doi]

Abstract

Abstract is missing.