Zheng Qu, Liu Liu 0017, Fengbin Tu, Zhaodong Chen, Yufei Ding, Yuan Xie 0001. DOTA: detect and omit weak attentions for scalable transformer acceleration. In Babak Falsafi, Michael Ferdman, Shan Lu 0001, Thomas F. Wenisch, editors, ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022. pages 14-26, ACM, 2022. [doi]
Abstract is missing.