Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas 0001, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang. Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 7992-8001, IEEE, 2021. [doi]
Abstract is missing.