Aligning video regions with action descriptions for open-vocabulary spatio-temporal action detection

Tao Wu 0020, Shuqiu Ge, Jiaqi Li, Xi Chen, Liang Li, Limin Wang 0002. Aligning video regions with action descriptions for open-vocabulary spatio-temporal action detection. Computer Vision and Image Understanding, 270:104825, 2026. [doi]

Abstract

Abstract is missing.