VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. pages 24206-24221, 2021. [doi]

This author has not been identified. Look up 'Hassan Akbari' in GoogleThis author has not been identified. Look up 'Liangzhe Yuan' in GoogleThis author has not been identified. Look up 'Rui Qian' in GoogleThis author has not been identified. Look up 'Wei-Hong Chuang' in GoogleThis author has not been identified. Look up 'Shih-Fu Chang' in GoogleThis author has not been identified. Look up 'Yin Cui' in GoogleThis author has not been identified. Look up 'Boqing Gong' in Google

runs on WebDSL