E.T.: re-thinking self-attention for transformer models on GPUs

Shiyang Chen, Shaoyi Huang, Santosh Pandey, Bingbing Li, Guang R. Gao, Long Zheng 0001, Caiwen Ding, Hang Liu 0001. E.T.: re-thinking self-attention for transformer models on GPUs. In Bronis R. de Supinski, Mary W. Hall, Todd Gamblin, editors, SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021. pages 25, ACM, 2021. [doi]

Abstract

Abstract is missing.