Leveraging Text Representation and Face-head Tracking for Long-form Multimodal Semantic Relation Understanding

Raksha Ramesh, Vishal Anand, Zifan Chen, Yifei Dong, Yun Chen, Ching-Yung Lin. Leveraging Text Representation and Face-head Tracking for Long-form Multimodal Semantic Relation Understanding. In João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh 0001, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, Laura Toni, editors, MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022. pages 7215-7219, ACM, 2022. [doi]

Abstract

Abstract is missing.