Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

Jungi Lee, Wonbeom Lee, Jaewoong Sim. Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization. In 51st ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2024, Buenos Aires, Argentina, June 29 - July 3, 2024. pages 1048-1062, IEEE, 2024. [doi]

Abstract

Abstract is missing.