POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference - researchr publication

researchr

You are not signed in
Sign in
Sign up

Aditya K. Kamath, Ramya Prabhu, Jayashree Mohan, Simon Peter 0001, Ramachandran Ramjee, Ashish Panwar. POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference. In Lieven Eeckhout, Georgios Smaragdakis, Katai Liang, Adrian Sampson, Martha A. Kim, Christopher J. Rossbach, editors, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2025, Rotterdam, Netherlands, 30 March 2025 - 3 April 2025. pages 897-912, ACM, 2025. [doi]

Abstract is missing.

runs on WebDSL