Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang 0001. Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024. pages 853-864, IEEE, 2024. [doi]

Authors

Ke Cheng

This author has not been identified. Look up 'Ke Cheng' in Google

Wen Hu

This author has not been identified. Look up 'Wen Hu' in Google

Zhi Wang

This author has not been identified. Look up 'Zhi Wang' in Google

Peng Du

This author has not been identified. Look up 'Peng Du' in Google

Jianguo Li

This author has not been identified. Look up 'Jianguo Li' in Google

Sheng Zhang 0001

This author has not been identified. Look up 'Sheng Zhang 0001' in Google