TokenSkipping: A Practical and Robust KV Cache Pruning Method for Long-Context LLM Inference

Narupol Hongthai, Ekapol Chuangsuwanich. TokenSkipping: A Practical and Robust KV Cache Pruning Method for Long-Context LLM Inference. In Proceedings of the 13th International Conference on Information Technology: IoT and Smart City, ICIT 2025, Shanghai, China, December 5-7, 2025. pages 195-200, ACM, 2025. [doi]

Abstract

Abstract is missing.