PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li 0001, Xiaowei Li 0001, Ying Wang 0001, Onur Mutlu. PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System. In Lieven Eeckhout, Georgios Smaragdakis, Katai Liang, Adrian Sampson, Martha A. Kim, Christopher J. Rossbach, editors, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2025, Rotterdam, Netherlands, 30 March 2025 - 3 April 2025. pages 766-782, ACM, 2025. [doi]

@inproceedings{HeMGSGLLWM25,
  title = {PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System},
  author = {Yintao He and Haiyu Mao and Christina Giannoula and Mohammad Sadrosadati and Juan Gómez-Luna and Huawei Li 0001 and Xiaowei Li 0001 and Ying Wang 0001 and Onur Mutlu},
  year = {2025},
  doi = {10.1145/3676641.3716009},
  url = {https://doi.org/10.1145/3676641.3716009},
  researchr = {https://researchr.org/publication/HeMGSGLLWM25},
  cites = {0},
  citedby = {0},
  pages = {766-782},
  booktitle = {Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2025, Rotterdam, Netherlands, 30 March 2025 - 3 April 2025},
  editor = {Lieven Eeckhout and Georgios Smaragdakis and Katai Liang and Adrian Sampson and Martha A. Kim and Christopher J. Rossbach},
  publisher = {ACM},
  isbn = {979-8-4007-1079-7},
}