Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George A. Constantinides, Yiren Zhao. Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. pages 9988-10006, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.