High-Throughput Non-uniformly Quantized 3-bit LLM Inference

Yuang Chen, Wenqi Zeng, Jeffrey Xu Yu. High-Throughput Non-uniformly Quantized 3-bit LLM Inference. In Tony Hosking, Madan Musuvathi, Kenjiro Taura, editors, Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2026, Sydney, NSW, Australia, 31 January 2026 - 4 February 2026. pages 288-300, ACM, 2026. [doi]

Abstract

Abstract is missing.