Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases

researchr

You are not signed in
Sign in
Sign up

Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen. Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. pages 8867-8871, IEEE, 2022. [doi]

@inproceedings{XieRDV22,
  title = {Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases},
  author = {Huang Xie and Okko Räsänen and Konstantinos Drossos and Tuomas Virtanen},
  year = {2022},
  doi = {10.1109/ICASSP43922.2022.9747336},
  url = {https://doi.org/10.1109/ICASSP43922.2022.9747336},
  researchr = {https://researchr.org/publication/XieRDV22},
  cites = {0},
  citedby = {0},
  pages = {8867-8871},
  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022},
  publisher = {IEEE},
  isbn = {978-1-6654-0540-9},
}

External Links

Cite Key

Statistics

PDF

Researchr

Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases