SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Hanrui Wang 0002, Zhekai Zhang, Song Han 0003. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In IEEE International Symposium on High-Performance Computer Architecture, HPCA 2021, Seoul, South Korea, February 27 - March 3, 2021. pages 97-110, IEEE, 2021. [doi]

Abstract

Abstract is missing.