Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang 0001. Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024. pages 853-864, IEEE, 2024. [doi]

Abstract

Abstract is missing.