The following publications are possibly variants of this publication:
- SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningKevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed 0001, Zhe Gan, Zicheng Liu 0001, Yumao Lu, Lijuan Wang. cvpr 2022: 17928-17937 [doi]
- Scenario-Aware Recurrent Transformer for Goal-Directed Video CaptioningXin Man, Deqiang Ouyang, XiangPeng Li, Jingkuan Song, Jie Shao. tomccap, 18(4), 2022. [doi]
- Depth-Aware Sparse Transformer for Video-Language LearningHaonan Zhang, Lianli Gao, Pengpeng Zeng, Alan Hanjalic, Heng Tao Shen. mm 2023: 4778-4787 [doi]
- Context-aware transformer for image captioningXin Yang, Ying Wang, HaiShun Chen, Jie Li, Tingting Huang. ijon, 549:126440, September 2023. [doi]
- A Sparse Transformer-Based Approach for Image CaptioningZhou Lei, Congcong Zhou, Shengbo Chen, Yiyong Huang, Xianrui Liu. access, 8:213437-213446, 2020. [doi]
- Parallel Pathway Dense Video Captioning With Deformable TransformerWangyu Choi, Jiasi Chen, Jongwon Yoon. access, 10:129899-129910, 2022. [doi]