XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

João Monteiro 0002, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vázquez 0001, Nicolas Chapados, Christopher Pal, Perouz Taslakian. XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference. In Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024. pages 15284-15302, Association for Computational Linguistics, 2024. [doi]

Authors

João Monteiro 0002

This author has not been identified. Look up 'João Monteiro 0002' in Google

Étienne Marcotte

This author has not been identified. Look up 'Étienne Marcotte' in Google

Pierre-André Noël

This author has not been identified. Look up 'Pierre-André Noël' in Google

Valentina Zantedeschi

This author has not been identified. Look up 'Valentina Zantedeschi' in Google

David Vázquez 0001

This author has not been identified. Look up 'David Vázquez 0001' in Google

Nicolas Chapados

This author has not been identified. Look up 'Nicolas Chapados' in Google

Christopher Pal

This author has not been identified. Look up 'Christopher Pal' in Google

Perouz Taslakian

This author has not been identified. Look up 'Perouz Taslakian' in Google