Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases

Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen. Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. pages 8867-8871, IEEE, 2022. [doi]

Authors

Huang Xie

This author has not been identified. Look up 'Huang Xie' in Google

Okko Räsänen

This author has not been identified. Look up 'Okko Räsänen' in Google

Konstantinos Drossos

This author has not been identified. Look up 'Konstantinos Drossos' in Google

Tuomas Virtanen

This author has not been identified. Look up 'Tuomas Virtanen' in Google