PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

Yintao He, Haiyu Mao, Christina Giannoula, Mohammad Sadrosadati, Juan Gómez-Luna, Huawei Li 0001, Xiaowei Li 0001, Ying Wang 0001, Onur Mutlu. PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System. In Lieven Eeckhout, Georgios Smaragdakis, Katai Liang, Adrian Sampson, Martha A. Kim, Christopher J. Rossbach, editors, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2025, Rotterdam, Netherlands, 30 March 2025 - 3 April 2025. pages 766-782, ACM, 2025. [doi]

Authors

Yintao He

This author has not been identified. Look up 'Yintao He' in Google

Haiyu Mao

This author has not been identified. Look up 'Haiyu Mao' in Google

Christina Giannoula

This author has not been identified. Look up 'Christina Giannoula' in Google

Mohammad Sadrosadati

This author has not been identified. Look up 'Mohammad Sadrosadati' in Google

Juan Gómez-Luna

This author has not been identified. Look up 'Juan Gómez-Luna' in Google

Huawei Li 0001

This author has not been identified. Look up 'Huawei Li 0001' in Google

Xiaowei Li 0001

This author has not been identified. Look up 'Xiaowei Li 0001' in Google

Ying Wang 0001

This author has not been identified. Look up 'Ying Wang 0001' in Google

Onur Mutlu

This author has not been identified. It may be one of the following persons: Look up 'Onur Mutlu' in Google