SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models

Fahao Chen, Peng Li 0017, Tom H. Luan, Zhou Su, Jing Deng. SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models. In IEEE INFOCOM 2025 - IEEE Conference on Computer Communications, London, United Kingdom, May 19-22, 2025. pages 1-10, IEEE, 2025. [doi]

Abstract

Abstract is missing.