Transformer-Based Resource and Stage-Aware Scheduling for Model-Parallel LLM Inference

Rami Naeem, Tengis Buyantogtokh, Hamada Rizk, Tatsuya Amano, Hirozumi Yamaguchi. Transformer-Based Resource and Stage-Aware Scheduling for Model-Parallel LLM Inference. In Keiichi Yasumoto, Idit Keidar, Hirozumi Yamaguchi, Simone Silvestri, Vincent Gramoli, editors, Companion Proceedings of the 27th International Conference on Distributed Computing and Networking, ICDCN Companion 2026, Nara, Japan, January 6-9, 2026. pages 72-77, ACM, 2026. [doi]

Abstract

Abstract is missing.