Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, YuQing Yang, Ting Cao, Hao Sun 0015, Weiwei Deng, Qi Zhang, Mao Yang. Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference. In Ambuj Singh, Yizhou Sun, Leman Akoglu, Dimitrios Gunopulos, Xifeng Yan, Ravi Kumar 0001, Fatma Ozcan, Jieping Ye, editors, Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023. pages 1280-1290, ACM, 2023. [doi]

Abstract

Abstract is missing.