Elias Frantar, Dan Alistarh. QMoE: Sub-1-Bit Compression of Trillion Parameter Models. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]
Abstract is missing.