| 0 | -- | 0 | Haonan Cheng, Hanyue Liu, JuanJuan Cai, Long Ye. CLFormer: a cross-lingual transformer framework for temporal forgery localization |
| 0 | -- | 0 | Yifei Deng, Zhengyu Chen, Chenglong Li 0002, Jin Tang 0001. Uncertainty-aware coarse-to-fine alignment for text-image person retrieval |
| 0 | -- | 0 | Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan 0001, Changsheng Chen, Zitong Yu, Xiaochun Cao. SHIELD: an evaluation benchmark for face spoofing and forgery detection with multimodal large language models |
| 0 | -- | 0 | Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu 0036. Enhancing human-centered dynamic scene understanding via multiple LLMs collaborated reasoning |
| 0 | -- | 0 | Jiaxin Mei, Tao Zhou 0002, Kaiwen Huang, Yizhe Zhang 0001, Yi Zhou 0007, Ye Wu 0001, Huazhu Fu. A survey on deep learning for polyp segmentation: techniques, challenges and future trends |
| 0 | -- | 0 | Xiaohan Fang, Peilin Chen 0001, Meng Wang 0017, Shiqi Wang 0001. Immersive video interaction system: a survey |
| 0 | -- | 0 | Suyan Li, Fuxiang Huang, Lei Zhang 0038. A survey of multimodal composite editing and retrieval |
| 0 | -- | 0 | Yingjia Xu, Mengxia Wu, Zixin Guo, Min Cao, Mang Ye, Jorma Laaksonen. Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening |
| 0 | -- | 0 | Xiao Wang 0014, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang 0002, Chuanfu Li, Jin Tang 0001. Pre-training on high-resolution X-ray images: an experimental study |
| 0 | -- | 0 | Ruikun Zhang, Zhiyuan Yang, Liyuan Pan. DehazeMamba: large multi-modal model guided single image dehazing via mamba |
| 0 | -- | 0 | Qianggang Ding, Zhichao Shen, WeiQiang Zhu, Bang Liu. DASFormer: self-supervised pretraining for earthquake monitoring |
| 0 | -- | 0 | Mingjin Zhang, Qian Xu, Yuchun Wang, Xi Li, Haojuan Yuan. MIRSAM: multimodal vision-language segment anything model for infrared small target detection |
| 0 | -- | 0 | Zhe Cao 0001, Lixin Xu, Jin Zhang, Biwen Yang, Kaizheng Chen, Ruiheng Zhang. DBDB: de-bimodal defocus blur in joint infrared-visible imaging |
| 0 | -- | 0 | Yuli Zhou, Guolei Sun, Yawei Li 0001, Guo-Sen Xie, Luca Benini, Ender Konukoglu. When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| 0 | -- | 0 | Yasheng Sun, Bohan Li, Mingchen Zhuge, Deng-Ping Fan, Salman H. Khan 0001, Fahad Shahbaz Khan, Hideki Koike. Connecting dreams with visual brainstorming instruction |