CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian J. McAuley. CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023, New Paltz, NY, USA, October 22-25, 2023. pages 1-5, IEEE, 2023. [doi]

This author has not been identified. Look up 'Hao-Wen Dong' in GoogleThis author has not been identified. Look up 'Xiaoyu Liu' in GoogleThis author has not been identified. Look up 'Jordi Pons' in GoogleThis author has not been identified. Look up 'Gautam Bhattacharya' in GoogleThis author has not been identified. Look up 'Santiago Pascual' in GoogleThis author has not been identified. Look up 'Joan Serrà' in GoogleThis author has not been identified. Look up 'Taylor Berg-Kirkpatrick' in GoogleThis author has not been identified. Look up 'Julian J. McAuley' in Google

runs on WebDSL