ServerlessPD: Fast RDMA-Codesigned Disaggregated Prefill-Decoding for Serverless Inference of Large Language Models

Mingxuan Liu, Jianhua Gu, Tianhai Zhao. ServerlessPD: Fast RDMA-Codesigned Disaggregated Prefill-Decoding for Serverless Inference of Large Language Models. In Rong N. Chang, Carl K. Chang, Jingwei Yang, Nimanthi Atukorala, Dan Chen, Sumi Helal, Sasu Tarkoma, Qiang He 0001, Tevfik Kosar, Claudio A. Ardagna, Amin Beheshti, Bo Cheng 0001, Walid Gaaloul, editors, IEEE International Conference on Web Services, ICWS 2025, Helsinki, Finland, July 7-12, 2025. pages 305-315, IEEE, 2025. [doi]

Abstract

Abstract is missing.