Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun. Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.