Yanying Lin, Yanbo Li, Shijie Peng, Yingfei Tang, Shutian Luo, Haiying Shen, Cheng-Zhong Xu 0001, Kejiang Ye. QUART: Latency-Aware FaaS System for Pipelining Large Model Inference. In 44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024, Jersey City, NJ, USA, July 23-26, 2024. pages 1-12, IEEE, 2024. [doi]
Abstract is missing.