MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Difei Gao, Luowei Zhou, Lei Ji 0001, Linchao Zhu, Yi Yang, Mike Zheng Shou. MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. pages 14773-14783, IEEE, 2023. [doi]

This author has not been identified. Look up 'Difei Gao' in GoogleThis author has not been identified. Look up 'Luowei Zhou' in GoogleThis author has not been identified. Look up 'Lei Ji 0001' in GoogleThis author has not been identified. Look up 'Linchao Zhu' in GoogleThis author has not been identified. Look up 'Yi Yang' in GoogleThis author has not been identified. Look up 'Mike Zheng Shou' in Google

runs on WebDSL