Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou 0002, Binhang Yuan, Zhao Song 0002, Anshumali Shrivastava, Ce Zhang 0001, Yuandong Tian, Christopher RĂ©, Beidi Chen. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 22137-22176, PMLR, 2023. [doi]

Abstract

Abstract is missing.