ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition

Yinting Huang, Keran Zheng, Zhewen Yu, Christos-Savvas Bouganis. ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition. In 33rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2025, Fayetteville, AR, USA, May 4-7, 2025. pages 114-122, IEEE, 2025. [doi]

Abstract

Abstract is missing.