The following publications are possibly variants of this publication:
- Efficient Vision Transformer via Token MergerZhanzhou Feng, Shiliang Zhang. TIP, 32:4156-4169, 2023. [doi]
- Recurring the Transformer for Video Action RecognitionJiewen Yang, Xingbo Dong, Liujun Liu, Chao Zhang, Jiajun Shen, Dahai Yu. cvpr 2022: 14043-14053 [doi]
- No Token Left Behind: Efficient Vision Transformer via Dynamic Token IdlingXuwei Xu, Changlin Li, Yudong Chen 0002, Xiaojun Chang, Jiajun Liu, Sen Wang 0001. ausai 2024: 28-41 [doi]