Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction - researchr publication

researchr

You are not signed in
Sign in
Sign up

Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed. Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [doi]

Abstract is missing.

runs on WebDSL