Abstract is missing.
- Confidence-Calibrated Face Image Forgery Detection with Contrastive Representation DistillationPuning Yang, Huaibo Huang, Zhiyong Wang 0001, Aijing Yu, Ran He. 3-19 [doi]
- Exposing Face Forgery Clues via Retinex-Based Image EnhancementHan Chen, Yuzhen Lin, Bin Li 0011. 20-34 [doi]
- GB-CosFace: Rethinking Softmax-Based Face Recognition from the Perspective of Open Set ClassificationMingqiang Chen, Lizhe Liu, Xiaohao Chen, Siyu Zhu 0001. 35-51 [doi]
- Learning Video-Independent Eye Contact Segmentation from In-the-Wild VideosTianyi Wu, Yusuke Sugano. 52-70 [doi]
- Exemplar Free Class Agnostic CountingViresh Ranjan, Minh Hoai Nguyen. 71-87 [doi]
- Emphasizing Closeness and Diversity Simultaneously for Deep Face RepresentationChaoyu Zhao, Jianjun Qian, Shumin Zhu, Jin Xie 0001, Jian Yang 0003. 88-104 [doi]
- KinStyle: A Strong Baseline Photorealistic Kinship Face Synthesis with an Optimized StyleGAN EncoderLi-Chen Cheng, Shu-Chuan Hsu, Pin-Hua Lee, Hsiu-Chieh Lee, Che-Hsien Lin, Jun-Cheng Chen, Chih-Yu Wang 0001. 105-120 [doi]
- Occluded Facial Expression Recognition Using Self-supervised LearningJiahe Wang, Heyan Ding, Shangfei Wang. 121-136 [doi]
- Heterogeneous Avatar Synthesis Based on Disentanglement of Topology and RenderingNan Gao, Zhi Zeng, Guixuan Zhang, Shuwu Zhang. 137-152 [doi]
- Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action RecognitionZhimin Gao, Peitao Wang, Pei Lv, Xiaoheng Jiang, Qidong Liu, Pichao Wang, Mingliang Xu, Wanqing Li 0001. 155-171 [doi]
- Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action RecognitionRui Hang, Minxian Li. 172-188 [doi]
- 3D Pose Based Feedback for Physical ExercisesZiyi Zhao, Sena Kiciroglu, Hugues Vinzant, Yuan Cheng, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua. 189-205 [doi]
- Generating Multiple Hypotheses for 3D Human Mesh and Pose Using Conditional Generative Adversarial NetsXu Zheng, Yali Zheng, Shubing Yang. 206-222 [doi]
- SCOAD: Single-Frame Click Supervision for Online Action DetectionNa Ye, Xing Zhang, Dawei Yan, Wei Dong, Qingsen Yan. 223-238 [doi]
- Neural Puppeteer: Keypoint-Based Neural Rendering of Dynamic ShapesSimon Giebenhain, Urs Waldmann, Ole Johannsen, Bastian Goldluecke. 239-256 [doi]
- Decanus to Legatus: Synthetic Training for 2D-3D Human Pose LiftingYue Zhu, David Picard. 257-274 [doi]
- Social Aware Multi-modal Pedestrian Crossing Behavior PredictionXiaolin Zhai, Zhengxi Hu, Dingye Yang, Lei Zhou, Jingtai Liu. 275-290 [doi]
- Action Representing by Constrained Conditional Mutual InformationHaoyuan Gao, Yifaan Zhang, Linhui Sun, Jian Cheng. 291-306 [doi]
- Temporal-Viewpoint Transportation Plan for Skeletal Few-Shot Action RecognitionLei Wang 0108, Piotr Koniusz. 307-326 [doi]
- Spatial Temporal Network for Image and Skeleton Based Group Activity RecognitionXiaolin Zhai, Zhengxi Hu, Dingye Yang, Lei Zhou, Jingtai Liu. 329-346 [doi]
- Learning Using Privileged Information for Zero-Shot Action RecognitionZhiyi Gao, Yonghong Hou, Wanqing Li 0001, Zihui Guo, Bin Yu. 347-362 [doi]
- MGTR: End-to-End Mutual Gaze Detection with TransformerHang Guo, Zhengxi Hu, Jingtai Liu. 363-378 [doi]
- Is an Object-Centric Video Representation Beneficial for Transfer?Chuhan Zhang, Ankush Gupta 0001, Andrew Zisserman. 379-397 [doi]
- DCVQE: A Hierarchical Transformer for Video Quality AssessmentZutong Li, Lei Yang. 398-416 [doi]
- Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase RecognitionFangqiu Yi, Yanfeng Yang, Tingting Jiang. 417-432 [doi]
- FunnyNet: Audiovisual Learning of Funny Moments in VideosZhi-Song Liu, Robin Courant, Vicky Kalogeiton. 433-450 [doi]
- ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAdriano Fragomeni, Michael Wray, Dima Damen. 451-468 [doi]
- A Compressive Prior Guided Mask Predictive Coding Approach for Video AnalysisZhimeng Huang, Chuanmin Jia, Shanshe Wang, Siwei Ma. 469-484 [doi]
- BaSSL: Boundary-aware Self-Supervised Learning for Video Scene SegmentationJonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim. 485-501 [doi]
- HaViT: Hybrid-Attention Based Vision Transformer for Video ClassificationLi Li, Liansheng Zhuang, Shenghua Gao, Shafei Wang. 502-517 [doi]
- From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross-Modal RetrievalYang Zhao, Jiaguo Yu, Shengbin Liao, Zheng Zhang, Haofeng Zhang. 521-536 [doi]
- SST-VLM: Sparse Sampling-Twice Inspired Video-Language ModelYizhao Gao, Zhiwu Lu 0001. 537-553 [doi]
- PromptLearner-CLIP: Contrastive Multi-Modal Action Representation Learning with Context OptimizationZhenXing Zheng, GaoYun An, Shan Cao, Zhaoqilin Yang, Qiuqi Ruan. 554-570 [doi]
- Causal Property Based Anti-conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph GenerationRuonan Zhang, GaoYun An. 571-587 [doi]
- gScoreCAM: What Objects Is CLIP Looking At?Peijie Chen, Qi Li, Saad Biaz, Trung Bui, Anh Nguyen. 588-604 [doi]
- From Within to Between: Knowledge Distillation for Cross Modality RetrievalVinh Tran 0005, Niranjan Balasubramanian, Minh Hoai. 605-622 [doi]
- Thinking Hallucination for Video CaptioningNasib Ullah, Partha Pratim Mohanta. 623-640 [doi]
- Boundary-Aware Temporal Sentence Grounding with Adaptive Proposal RefinementJianxiang Dong, Zhaozheng Yin. 641-657 [doi]
- Two-Stage Multimodality Fusion for High-Performance Text-Based Visual Question AnsweringBingjia Li, Jie Wang, Minyi Zhao, Shuigeng Zhou. 658-674 [doi]
- Bright as the Sun: In-depth Analysis of Imagination-Driven Image CaptioningHuyen Thi Thanh Tran, Takayuki Okatani. 675-691 [doi]
- Heterogeneous Interactive Learning Network for Unsupervised Cross-Modal RetrievalYuanchao Zheng, Xiaowei Zhang 0003. 692-707 [doi]
- GaitStrip: Gait Recognition via Effective Strip-Based Feature Representations and Multi-level FrameworkMing Wang, Beibei Lin, Xianda Guo, Lincheng Li, Zheng Zhu, Jiande Sun, Shunli Zhang, Yu Liu, Xin Yu. 711-727 [doi]
- Soft Label Mining and Average Expression Anchoring for Facial Expression RecognitionHaipeng Ming, Wenhuan Lu, Wei Zhang. 728-744 [doi]
- 'Labelling the Gaps': A Weakly Supervised Automatic Eye Gaze EstimationShreya Ghosh 0001, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe. 745-763 [doi]