PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari. PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2024, Atlanta, GA, USA, November 17-22, 2024. pages 40, IEEE, 2024. [doi]

Authors

Branden Butler

This author has not been identified. Look up 'Branden Butler' in Google

Sixing Yu

This author has not been identified. Look up 'Sixing Yu' in Google

Arya Mazaheri

This author has not been identified. Look up 'Arya Mazaheri' in Google

Ali Jannesari

This author has not been identified. Look up 'Ali Jannesari' in Google