QUART: Latency-Aware FaaS System for Pipelining Large Model Inference

Yanying Lin, Yanbo Li, Shijie Peng, Yingfei Tang, Shutian Luo, Haiying Shen, Cheng-Zhong Xu 0001, Kejiang Ye. QUART: Latency-Aware FaaS System for Pipelining Large Model Inference. In 44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024, Jersey City, NJ, USA, July 23-26, 2024. pages 1-12, IEEE, 2024. [doi]

Authors

Yanying Lin

This author has not been identified. Look up 'Yanying Lin' in Google

Yanbo Li

This author has not been identified. Look up 'Yanbo Li' in Google

Shijie Peng

This author has not been identified. Look up 'Shijie Peng' in Google

Yingfei Tang

This author has not been identified. Look up 'Yingfei Tang' in Google

Shutian Luo

This author has not been identified. Look up 'Shutian Luo' in Google

Haiying Shen

This author has not been identified. Look up 'Haiying Shen' in Google

Cheng-Zhong Xu 0001

This author has not been identified. Look up 'Cheng-Zhong Xu 0001' in Google

Kejiang Ye

This author has not been identified. Look up 'Kejiang Ye' in Google