The following publications are possibly variants of this publication:
- Leveraging Batch Normalization for Vision TransformersZhuliang Yao, Yue Cao 0001, Yutong Lin, Ze Liu, Zheng Zhang 0022, Han Hu 0004. iccvw 2021: 413-422 [doi]
- Temporally Efficient Vision Transformer for Video Instance SegmentationShusheng Yang, Xinggang Wang, Yu Li 0003, Yuxin Fang, Jiemin Fang, Wenyu Liu 0001, Xun Zhao, Ying Shan. cvpr 2022: 2875-2885 [doi]
- LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image RecognitionYoubing Hu, Yun Cheng, Anqi Lu, Zhiqiang Cao, DaWei Wei, Jie Liu 0001, Zhijun Li 0002. AAAI 2024: 2274-2284 [doi]
- IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision TransformersBowen Pan, Rameswar Panda, Yifan Jiang 0001, Zhangyang Wang, Rogério Feris, Aude Oliva. nips 2021: 24898-24911 [doi]