| 4448 | -- | 4451 | Liqiang Nie, Jianlong Wu, Nicu Sebe, Kiyoharu Aizawa. Guest Editorial Introduction to the Special Issue on Video Transformers |
| 4452 | -- | 4461 | Wei Wang, Xin Yang 0008, Jinhui Tang 0001. Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification |
| 4462 | -- | 4471 | Yang Yu, Rongrong Ni, Yao Zhao 0001, Siyuan Yang, Fen Xia, Ning Jiang, Guoqing Zhao. MSVT: Multiple Spatiotemporal Views Transformer for DeepFake Video Detection |
| 4472 | -- | 4483 | Weili Guan, Xuemeng Song, Kejie Wang, Haokun Wen, Hongda Ni, Yaowei Wang, Xiaojun Chang. Egocentric Early Action Prediction via Multimodal Transformer-Based Dual Action Prediction |
| 4484 | -- | 4495 | Bofeng Wu, Buyu Liu, Peng Huang, Jun Bao, Peng Xi, Jun Yu 0002. Concept Parser With Multimodal Graph Learning for Video Captioning |
| 4496 | -- | 4506 | Fan Zhang 0045, Gongguan Chen, Hua Wang, Jinjiang Li, Caiming Zhang 0001. Multi-Scale Video Super-Resolution Transformer With Polynomial Approximation |
| 4507 | -- | 4517 | Feng Xue, Yu Li, Deyin Liu, Yincen Xie, Lin Wu 0001, Richang Hong. LipFormer: Learning to Lipread Unseen Speakers Based on Visual-Landmark Transformers |
| 4518 | -- | 4528 | Mingqi Gao 0003, Jinyu Yang, Jungong Han, Ke Lu, Feng Zheng, Giovanni Montana. Decoupling Multimodal Transformers for Referring Video Object Segmentation |
| 4529 | -- | 4541 | Rong Wang, Zongheng Tang, Qianli Zhou, Xiaoqian Liu, Tianrui Hui, Quange Tan, Si Liu 0001. Unified Transformer With Isomorphic Branches for Natural Language Tracking |
| 4542 | -- | 4551 | Yuhui Zheng, Yan Zhang, Bin Xiao 0002. Target-Aware Transformer Tracking |
| 4552 | -- | 4563 | Guanlin Chen, Pengfei Zhu, Bing Cao, Xing Wang, Qinghua Hu. Cross-Drone Transformer Network for Robust Single Object Tracking |
| 4564 | -- | 4576 | Di Gai, Runyang Feng, Weidong Min, Xiaosong Yang, Pengxiang Su, Qi Wang, Qing Han. Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation |
| 4577 | -- | 4587 | Haipeng Chen 0002, Jiahui Hu, Wenyin Zhang, Pengxiang Su. Spatiotemporal Consistency Learning From Momentum Cues for Human Motion Prediction |
| 4588 | -- | 4602 | Wenfei Wan, Dengjia Huang, Bin Shang, Shengyu Wei, Hong Ren Wu, Jinjian Wu, Guangming Shi. Depth Perception Assessment of 3D Videos Based on Stereoscopic and Spatial Orientation Structural Features |
| 4603 | -- | 4615 | Yujie Hu, Yinhuai Wang, Jian Zhang 0018. DEAR-GAN: Degradation-Aware Face Restoration With GAN Prior |
| 4616 | -- | 4629 | Chengcheng Ma, Yang Liu, Jiankang deng, Lingxi Xie, Weiming Dong, Changsheng Xu. Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models |
| 4630 | -- | 4644 | Jiaxin Yao, Yongqiang Zhao 0001, Yuanyang Bu, Seong G. Kong, Jonathan Cheung-Wai Chan. Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion |
| 4645 | -- | 4659 | Ying Yang, Tao Xiang, Shangwei Guo, Xiao Lv, Hantao Liu, Xiaofeng Liao 0001. EHNQ: Subjective and Objective Quality Evaluation of Enhanced Night-Time Images |
| 4660 | -- | 4674 | Ying Huang, Hu Guan, Jie Liu 0028, Shuwu Zhang, Baoning Niu, Guixuan Zhang. Robust Texture-Aware Local Adaptive Image Watermarking With Perceptual Guarantee |
| 4675 | -- | 4688 | Wei Wu 0019, Yong Liu, Zhu Li 0001. Subband Differentiated Learning Network for Rain Streak Removal |
| 4689 | -- | 4702 | Yining Su, Lin Teng, Pengbo Liu, Salahuddin Unar, Xingyuan Wang 0001, XianPing Fu. Visualized Multiple Image Selection Encryption Based on Log Chaos System and Multilayer Cellular Automata Saliency Detection |
| 4703 | -- | 4714 | Yuyuan Zeng, Bowen Zhao, Shanzhao Qiu, Tao Dai 0001, Shu-Tao Xia. Toward Effective Image Manipulation Detection With Proposal Contrastive Learning |
| 4715 | -- | 4727 | Lu Sun, Yichen Wang, Fangfang Wu, Xin Li 0005, Weisheng Dong, Guangming Shi. Deep Unfolding Network for Efficient Mixed Video Noise Removal |
| 4728 | -- | 4740 | Xin Zhou, Xiao-wen Liu, Gong Zhang, Luliang Jia, Xu Wang 0015, Zhiyuan Zhao. An Iterative Threshold Algorithm of Log-Sum Regularization for Sparse Problem |
| 4741 | -- | 4753 | Bo Jiang, Yao Lu, Bob Zhang 0001, Guangming Lu. Few-Shot Learning for Image Denoising |
| 4754 | -- | 4768 | Pei Geng, Xuequan Lu, Chunyu Hu, Hong Liu 0013, Lei Lyu. Focusing Fine-Grained Action by Self-Attention-Enhanced Graph Neural Networks With Contrastive Learning |
| 4769 | -- | 4783 | Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós, Juan C. SanMiguel. Attention-Based Knowledge Distillation in Scene Recognition: The Impact of a DCT-Driven Loss |
| 4784 | -- | 4797 | Zheng Zhou, Yongyong Chen, Yicong Zhou. Deep Dynamic Memory Augmented Attentional Dictionary Learning for Image Denoising |
| 4798 | -- | 4811 | Leida Li, Yipo Huang, Jinjian Wu, Yuzhe Yang, Yaqian Li, Yandong Guo, Guangming Shi. Theme-Aware Visual Attribute Reasoning for Image Aesthetics Assessment |
| 4812 | -- | 4824 | Ninghui Xu, Lihui Wang 0003, Jiajia Zhao, Zhiting Yao. Denoising for Dynamic Vision Sensor Based on Augmented Spatiotemporal Correlation |
| 4825 | -- | 4839 | Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu 0006, Yuncheng Yang, Wenbo Hu. SUES-200: A Multi-Height Multi-Scene Cross-View Image Benchmark Across Drone and Satellite |
| 4840 | -- | 4854 | Haoning Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin. DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment |
| 4855 | -- | 4867 | Zicheng Feng, Wenlong Zhang, Shunkun Liang, Qifeng Yu. Deep Video Super-Resolution Using Hybrid Imaging System |
| 4868 | -- | 4880 | Zhi-Yong Wang, Xiao-peng Li, Hing-Cheung So, Abdelhak M. Zoubir. Adaptive Rank-One Matrix Completion Using Sum of Outer Products |
| 4881 | -- | 4892 | Huapeng Wu, Jie Gui, Jun Zhang 0024, James T. Kwok, Zhihui Wei. Feedback Pyramid Attention Networks for Single Image Super-Resolution |
| 4893 | -- | 4906 | Kai Zeng, Kejiang Chen, Weiming Zhang 0001, Yaofei Wang, Nenghai Yu. Robust Steganography for High Quality Images |
| 4907 | -- | 4920 | Zenan Shi, Haipeng Chen 0002, Dong Zhang. Transformer-Auxiliary Neural Networks for Image Manipulation Localization by Operator Inductions |
| 4921 | -- | 4933 | Shihao Zou, Yuanlu Xu, Chao Li 0021, Lingni Ma, Li Cheng 0001, Minh Vo. Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet |
| 4934 | -- | 4947 | Yu Liu 0023, Haihang Li, Juan Cheng, Xun Chen 0001. MSCAF-Net: A General Framework for Camouflaged Object Detection via Learning Multi-Scale Context-Aware Features |
| 4948 | -- | 4961 | Jun Li 0043, Yuquan Bi, Sumei Wang, Qiming Li. CFRLA-Net: A Context-Aware Feature Representation Learning Anchor-Free Network for Pedestrian Detection |
| 4962 | -- | 4972 | Huafeng Li, Minghui Liu, Zhanxuan Hu, Feiping Nie 0001, Zhengtao Yu 0001. Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification |
| 4973 | -- | 4984 | Shuang Li, Lichun Wang 0002, Shaofan Wang, Dehui Kong, Baocai Yin. Hierarchical Coupled Discriminative Dictionary Learning for Zero-Shot Learning |
| 4985 | -- | 4996 | Zhuoxu Huang, Zhiyou Zhao, Banghuai Li, Jungong Han. LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers |
| 4997 | -- | 5008 | Hao Zhang, Shenqi Lai, Yaxiong Wang, Zongyang Da, Yujie Dun, Xueming Qian. SCGNet: Shifting and Cascaded Group Network |
| 5009 | -- | 5021 | Ruyi Ji, Jiaying Li, Libo Zhang 0001, Jing Liu, Yanjun Wu. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification |
| 5022 | -- | 5035 | Hao Ren, Ziqiang Zheng, Yang Wu 0001, Hong Lu 0001, Yang Yang 0002, Ying Shan, Sai Kit Yeung. ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval |
| 5036 | -- | 5048 | Weiping Xiao, Yan Peng 0001, Chang Liu, Jiantao Gao, Yiqiang Wu, Xiaomao Li. Balanced Sample Assignment and Objective for Single-Model Multi-Class 3D Object Detection |
| 5049 | -- | 5061 | Chenglong Zhao, Yunxiang Zhang, Bingbing Ni. Exploiting Channel Similarity for Network Pruning |
| 5062 | -- | 5075 | Fei Zhou, Wei Wei 0008, Lei Zhang 0054, Yanning Zhang. Learning to Class-Adaptively Manipulate Embeddings for Few-Shot Learning |
| 5076 | -- | 5088 | Zexing Du, Xue Wang 0006, Qing Wang 0006. Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition |
| 5089 | -- | 5101 | Xinchen Ye, Jinyi Zhang, Yazhi Yuan, Rui Xu 0002, Zhihui Wang, Haojie Li. Underwater Depth Estimation via Stereo Adaptation Networks |
| 5102 | -- | 5116 | Chuanming Tang, Xiao Wang 0014, Yuanchao Bai, Zhe Wu, Jianlin Zhang, Yongmei Huang. Learning Spatial-Frequency Transformer for Visual Object Tracking |
| 5117 | -- | 5132 | Yan Jin, Fang Gao, Jun Yu 0001, Jiabao Wang, Feng Shuang 0002. Multi-Object Tracking: Decoupling Features to Solve the Contradictory Dilemma of Feature Requirements |
| 5133 | -- | 5147 | Jing Li, Liu Yang, Qilong Wang, Qinghua Hu. WDAN: A Weighted Discriminative Adversarial Network With Dual Classifiers for Fine-Grained Open-Set Domain Adaptation |
| 5148 | -- | 5159 | Huanjie Tao, Qianyue Duan, Jianfeng An. An Adaptive Interference Removal Framework for Video Person Re-Identification |
| 5160 | -- | 5173 | Haihong Xiao, Yuqiong Li, Wenxiong Kang, Qiuxia Wu. Distinguishing and Matching-Aware Unsupervised Point Cloud Completion |
| 5174 | -- | 5185 | Zhilei Li, Jun Li 0072, Yuqing Ma, Rui Wang 0024, Zhi-Ping Shi 0002, Yifu Ding, Xianglong Liu 0001. Spatio-Temporal Adaptive Network With Bidirectional Temporal Difference for Action Recognition |
| 5186 | -- | 5199 | Yi Hou, Shanghang Zhang, Rui Ma, Huizhu Jia, Xiaodong Xie. Frame-Recurrent Video Crowd Counting |
| 5200 | -- | 5211 | Mingrui Zhu, Zicheng Wu, Nannan Wang 0001, Heng Yang, Xinbo Gao 0001. Dual Conditional Normalization Pyramid Network for Face Photo-Sketch Synthesis |
| 5212 | -- | 5226 | Sheng Cheng, Han Hu, Xinggong Zhang. ABRF: Adaptive BitRate-FEC Joint Control for Real-Time Video Streaming |
| 5227 | -- | 5241 | Yi Chen, Meng Wang 0017, Shiqi Wang 0001, Zhangkai Ni, Sam Kwong. A CTU-Level Screen Content Rate Control for Low-Delay Versatile Video Coding |
| 5242 | -- | 5256 | Yunlong Li, Xinfeng Zhang 0001, Chen Cui, Shanshe Wang, Siwei Ma. Fleet: Improving Quality of Experience for Low-Latency Live Video Streaming |
| 5257 | -- | 5270 | Huaiwen Zhang, Yang Yang, Fan Qi, Shengsheng Qian, Changsheng Xu. Debiased Video-Text Retrieval via Soft Positive Sample Calibration |
| 5271 | -- | 5280 | Jiwei Wei, Yang Yang 0002, Xing Xu 0001, Jingkuan Song, Guoqing Wang 0001, Heng Tao Shen. Less is Better: Exponential Loss for Cross-Modal Matching |
| 5281 | -- | 5295 | Xin Sun, Jialin Gao, Yizhe Zhu, Xuan Wang, Xi Zhou. Video Moment Retrieval via Comprehensive Relation-Aware Network |
| 5296 | -- | 5308 | Rong-Cheng Tu, Jie Jiang, Qinghong Lin, Chengfei Cai, Shangxuan Tian, Hongfa Wang, Wei Liu 0005. Unsupervised Cross-Modal Hashing With Modality-Interaction |
| 5309 | -- | 5317 | Sung-Jun Min, Kyeongbo Kong, Suk-Ju Kang. Out-of-Focus Image Deblurring for Mobile Display Vision Inspection |
| 5318 | -- | 5329 | Mixiao Hou, Zheng Zhang 0006, Chang Liu, Guangming Lu. Semantic Alignment Network for Multi-Modal Emotion Recognition |