E.T.: re-thinking self-attention for transformer models on GPUs

Shiyang Chen, Shaoyi Huang, Santosh Pandey, Bingbing Li, Guang R. Gao, Long Zheng 0001, Caiwen Ding, Hang Liu 0001. E.T.: re-thinking self-attention for transformer models on GPUs. In Bronis R. de Supinski, Mary W. Hall, Todd Gamblin, editors, SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021. pages 25, ACM, 2021. [doi]

@inproceedings{ChenHPLG0D021,
  title = {E.T.: re-thinking self-attention for transformer models on GPUs},
  author = {Shiyang Chen and Shaoyi Huang and Santosh Pandey and Bingbing Li and Guang R. Gao and Long Zheng 0001 and Caiwen Ding and Hang Liu 0001},
  year = {2021},
  doi = {10.1145/3458817.3476138},
  url = {https://doi.org/10.1145/3458817.3476138},
  researchr = {https://researchr.org/publication/ChenHPLG0D021},
  cites = {0},
  citedby = {0},
  pages = {25},
  booktitle = {SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021},
  editor = {Bronis R. de Supinski and Mary W. Hall and Todd Gamblin},
  publisher = {ACM},
  isbn = {978-1-4503-8442-1},
}