SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting

Diaohan Luo, Tian Yu, Yuewen Wu, Heng Wu, Tao Wang 0030, Wenbo Zhang 0006. SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting. In Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. pages 605-614, ACM, 2023. [doi]

Abstract

Abstract is missing.