Enabling Efficient SpMM for Sparse Attention on GEMM-Optimized Hardware with Block Aggregation

Tianchu Ji, Niranjan Balasubramanian, Michael Ferdman, Peter A. Milder. Enabling Efficient SpMM for Sparse Attention on GEMM-Optimized Hardware with Block Aggregation. In Jing Li 0073, Grace Zgheib, editors, Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2026, Seaside, CA, USA, February 22-24, 2026. pages 67-78, ACM, 2026. [doi]

Abstract

Abstract is missing.