Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang 0001. Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024. pages 853-864, IEEE, 2024. [doi]
Abstract is missing.