InferLine: latency-aware provisioning and scaling for prediction serving pipelines

Daniel Crankshaw, Gur-Eyal Sela, Xiangxi Mo, Corey Zumar, Ion Stoica, Joseph Gonzalez 0001, Alexey Tumanov. InferLine: latency-aware provisioning and scaling for prediction serving pipelines. In Rodrigo Fonseca, Christina Delimitrou, Beng Chin Ooi, editors, SoCC '20: ACM Symposium on Cloud Computing, Virtual Event, USA, October 19-21, 2020. pages 477-491, ACM, 2020. [doi]

Abstract

Abstract is missing.