Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, Max Mühlhäuser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharifi. Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems. In Eiko Yoneki, Luigi Nardi, editors, Proceedings of the 3rd Workshop on Machine Learning and Systems, EuroMLSys 2023, Rome, Italy, 8 May 2023. pages 78-86, ACM, 2023. [doi]

Abstract

Abstract is missing.