João Monteiro 0002, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vázquez 0001, Nicolas Chapados, Christopher Pal, Perouz Taslakian. XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference. In Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024. pages 15284-15302, Association for Computational Linguistics, 2024. [doi]