Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Han Guo, William Brandon, Radostin Cholakov, Jonathan Ragan-Kelley, Eric P. Xing, Yoon Kim. Fast Matrix Multiplications for Lookup Table-Quantized LLMs. In Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024. pages 12419-12433, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.