Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari. PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2024, Atlanta, GA, USA, November 17-22, 2024. pages 40, IEEE, 2024. [doi]
Abstract is missing.