Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving

Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng 0001, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen 0001, Baris Kasikci. Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]

Authors

Yilong Zhao

This author has not been identified. Look up 'Yilong Zhao' in Google

Chien-Yu Lin

This author has not been identified. Look up 'Chien-Yu Lin' in Google

Kan Zhu

This author has not been identified. Look up 'Kan Zhu' in Google

Zihao Ye

This author has not been identified. Look up 'Zihao Ye' in Google

Lequn Chen

This author has not been identified. Look up 'Lequn Chen' in Google

Size Zheng 0001

This author has not been identified. Look up 'Size Zheng 0001' in Google

Luis Ceze

This author has not been identified. Look up 'Luis Ceze' in Google

Arvind Krishnamurthy

This author has not been identified. Look up 'Arvind Krishnamurthy' in Google

Tianqi Chen 0001

This author has not been identified. Look up 'Tianqi Chen 0001' in Google

Baris Kasikci

This author has not been identified. Look up 'Baris Kasikci' in Google