Runtime Prediction for Local Deployment of Large Language Models: A Case Study on Qwen Models Covering LoRA Fine-Tuning, RAG, and Inference

Jian Guo, Jianwen Wei, Yufei Cheng, Jiajie Sheng, Yijun Wu, Kento Sato. Runtime Prediction for Local Deployment of Large Language Models: A Case Study on Qwen Models Covering LoRA Fine-Tuning, RAG, and Inference. In Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops, SCA/HPCAsiaWS 2026, Osaka, Japan, January 26-29, 2026. pages 361-369, ACM, 2026. [doi]

Abstract

Abstract is missing.