TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction

Junyi Liu, LiangZhi Li, Tong Xiang, Bowen Wang, Yiming Qian. TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 9796-9810, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.