Juntao Zhao, Borui Wan, Chuan Wu, Yanghua Peng, Haibin Lin. POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization. In Michel Steuwer, I-Ting Angelina Lee, Milind Chabbi, editors, Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2024, Edinburgh, United Kingdom, March 2-6, 2024. pages 460-462, ACM, 2024. [doi]
Abstract is missing.