See, move and hear: a local-to-global multi-modal interaction network for video action recognition

Fan Feng, Yue Ming 0001, Nannan Hu, Jiangwan Zhou. See, move and hear: a local-to-global multi-modal interaction network for video action recognition. Appl. Intell., 53(16):19765-19784, August 2023. [doi]

Abstract

Abstract is missing.