The following publications are possibly variants of this publication:
- Medusa: Accelerating Serverless LLM Inference with MaterializationShaoxun Zeng, Minhui Xie, Shiwei Gao, Youmin Chen, Youyou Lu. asplos 2025: 653-668 [doi]
- Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative DecodingYunjia Xi, Hangyu Wang, Bo Chen, Jianghao Lin, Menghui Zhu, Weiwen Liu, Ruiming Tang, Zhewei Wei, Weinan Zhang 0001, Yong Yu 0001. sigir 2025: 1891-1901 [doi]
- Amphista: Bi-directional Multi-head Decoding for Accelerating LLM InferenceZeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li 0025, Jinzhang Peng, Lu Tian, Emad Barsoum. naacl 2025: 8925-8938 [doi]