The following publications are possibly variants of this publication:
- Vision-Language Transformer and Query Generation for Referring SegmentationHenghui Ding, Chang Liu, Suchen Wang, Xudong Jiang. iccv 2021: 16301-16310 [doi]
- Cross-modal transformer with language query for referring image segmentationWenjing Zhang, Quange Tan, Pengxin Li, Qi Zhang, Rong Wang. ijon, 536:191-205, June 2023. [doi]
- SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image SegmentationShuyi Ouyang, Hongyi Wang, Shiao Xie, Ziwei Niu, Ruofeng Tong 0001, Yen-Wei Chen 0001, Lanfen Lin. IJCAI 2023: 1294-1302 [doi]
- LAVT: Language-Aware Vision Transformer for Referring Image SegmentationZhao Yang 0002, Jiaqi Wang, Yansong Tang, Kai Chen 0026, Hengshuang Zhao, Philip H. S. Torr. cvpr 2022: 18134-18144 [doi]
- Language as Queries for Referring Video Object SegmentationJiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo 0002. cvpr 2022: 4964-4974 [doi]