- Haonan Cheng, Hanyue Liu, JuanJuan Cai, Long Ye. CLFormer: a cross-lingual transformer framework for temporal forgery localization. Vis. Intell., 3(1), 2025.
- Yifei Deng, Zhengyu Chen, Chenglong Li 0002, Jin Tang 0001. Uncertainty-aware coarse-to-fine alignment for text-image person retrieval. Vis. Intell., 3(1), 2025.
- Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan 0001, Changsheng Chen, Zitong Yu, Xiaochun Cao. SHIELD: an evaluation benchmark for face spoofing and forgery detection with multimodal large language models. Vis. Intell., 3(1), 2025.
- Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu 0036. Enhancing human-centered dynamic scene understanding via multiple LLMs collaborated reasoning. Vis. Intell., 3(1), 2025.
- Jiaxin Mei, Tao Zhou 0002, Kaiwen Huang, Yizhe Zhang 0001, Yi Zhou 0007, Ye Wu 0001, Huazhu Fu. A survey on deep learning for polyp segmentation: techniques, challenges and future trends. Vis. Intell., 3(1), 2025.
- Xiaohan Fang, Peilin Chen 0001, Meng Wang 0017, Shiqi Wang 0001. Immersive video interaction system: a survey. Vis. Intell., 3(1), 2025.
- Suyan Li, Fuxiang Huang, Lei Zhang 0038. A survey of multimodal composite editing and retrieval. Vis. Intell., 3(1), 2025.
- Yingjia Xu, Mengxia Wu, Zixin Guo, Min Cao, Mang Ye, Jorma Laaksonen. Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening. Vis. Intell., 3(1), 2025.
- Xiao Wang 0014, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang 0002, Chuanfu Li, Jin Tang 0001. Pre-training on high-resolution X-ray images: an experimental study. Vis. Intell., 3(1), 2025.
- Ruikun Zhang, Zhiyuan Yang, Liyuan Pan. DehazeMamba: large multi-modal model guided single image dehazing via mamba. Vis. Intell., 3(1), 2025.