Improving Throughput-oriented LLM Inference with CPU Computations

Daon Park, Bernhard Egger 0001. Improving Throughput-oriented LLM Inference with CPU Computations. In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, PACT 2024, Long Beach, CA, USA, October 14-16, 2024. pages 233-245, ACM, 2024. [doi]

Abstract

Abstract is missing.