Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh. MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025, Las Vegas, NV, USA, March 1-5, 2025. pages 239-251, ACM, 2025. [doi]
Abstract is missing.