HashAttention: Semantic Sparsity for Faster Inference

Aditya Desai, Shuo Yang, Alejandro Cuadron, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica. HashAttention: Semantic Sparsity for Faster Inference. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

Authors

Aditya Desai

This author has not been identified. Look up 'Aditya Desai' in Google

Shuo Yang

This author has not been identified. Look up 'Shuo Yang' in Google

Alejandro Cuadron

This author has not been identified. Look up 'Alejandro Cuadron' in Google

Matei Zaharia

This author has not been identified. Look up 'Matei Zaharia' in Google

Joseph E. Gonzalez

This author has not been identified. Look up 'Joseph E. Gonzalez' in Google

Ion Stoica

This author has not been identified. Look up 'Ion Stoica' in Google