Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, Max Mühlhäuser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharifi. Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems. In Eiko Yoneki, Luigi Nardi, editors, Proceedings of the 3rd Workshop on Machine Learning and Systems, EuroMLSys 2023, Rome, Italy, 8 May 2023. pages 78-86, ACM, 2023. [doi]

Authors

Mehran Salmani

This author has not been identified. Look up 'Mehran Salmani' in Google

Saeid Ghafouri

This author has not been identified. Look up 'Saeid Ghafouri' in Google

Alireza Sanaee

This author has not been identified. Look up 'Alireza Sanaee' in Google

Kamran Razavi

This author has not been identified. Look up 'Kamran Razavi' in Google

Max Mühlhäuser

This author has not been identified. Look up 'Max Mühlhäuser' in Google

Joseph Doyle

This author has not been identified. Look up 'Joseph Doyle' in Google

Pooyan Jamshidi

This author has not been identified. Look up 'Pooyan Jamshidi' in Google

Mohsen Sharifi

This author has not been identified. Look up 'Mohsen Sharifi' in Google