PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models

Eunyeong Cho, Jehyeon Bang, Ranggi Hwang, Minsoo Rhu. PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2026, Sydney, Australia, January 31 - Feb. 4, 2026. pages 1-16, IEEE, 2026. [doi]

Abstract

Abstract is missing.