The following publications are possibly variants of this publication:
- Attention as a Bayesian inference processSharat Chikkerur, Thomas Serre, Cheston Tan, Tomaso A. Poggio. hvei 2011: 786511 [doi]
- cosFormer: Rethinking Softmax In AttentionZhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong. iclr 2022: [doi]
- On the diversity of multi-head attentionJian Li, Xing Wang 0007, Zhaopeng Tu, Michael R. Lyu. ijon, 454:14-24, 2021. [doi]
- Rethinking the Self-Attention in Vision TransformersKyungmin Kim, Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Zhicheng Yan, Peter Vajda, Seon Joo Kim. cvpr 2021: 3071-3075 [doi]
- Rethinking the role of error in attentional learningMark R. Blair, R. Calen Walshe, Jordan I. Barnes, Lihan Chen. cogsci 2011: [doi]
- Triplet Attention: Rethinking the Similarity in TransformersHaoyi Zhou, Jianxin Li 0002, Jieqi Peng, Shuai Zhang, Shanghang Zhang. kdd 2021: 2378-2388 [doi]