ELLIE: Energy-Efficient LLM Inference at the Edge Via Prefill-Decode Splitting

Haoyang Fan, Yi-Chien Lin, Viktor K. Prasanna. ELLIE: Energy-Efficient LLM Inference at the Edge Via Prefill-Decode Splitting. In 36th IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2025, Vancouver, BC, Canada, July 28-30, 2025. pages 139-146, IEEE, 2025. [doi]

Abstract

Abstract is missing.