QMoE: Sub-1-Bit Compression of Trillion Parameter Models

Elias Frantar, Dan Alistarh. QMoE: Sub-1-Bit Compression of Trillion Parameter Models. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]

Abstract

Abstract is missing.