Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving

Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng 0001, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen 0001, Baris Kasikci. Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]

Abstract

Abstract is missing.