Shulai Zhang, Weihao Cui, Quan Chen, Zhengnian Zhang, Yue Guan, Jingwen Leng, Chao Li, Minyi Guo. PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences. In Lawrence Rauchwerger, Kirk W. Cameron, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos, editors, ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28 - 30, 2022. ACM, 2022. [doi]
Abstract is missing.