TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU

Changyao Lin, Zhenming Chen, Ziyang Zhang, Jie Liu 0001. TOP: Task-Based Operator Parallelism for Asynchronous Deep Learning Inference on GPU. IEEE Trans. Parallel Distrib. Syst., 36(2):266-281, February 2025. [doi]

Abstract

Abstract is missing.