Xinyuan Zhang, Jiang Liu 0010, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang 0004. Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization. In IEEE Wireless Communications and Networking Conference, WCNC 2024, Dubai, United Arab Emirates, April 21-24, 2024. pages 1-6, IEEE, 2024. [doi]
Abstract is missing.