Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads - researchr publication related

researchr

You are not signed in
Sign in
Sign up

Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024. [doi]

The following publications are possibly variants of this publication:

runs on WebDSL