WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

An Tran, Konstantinos Drossos, Tuomas Virtanen. WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information. In 29th European Signal Processing Conference, EUSIPCO 2021, Dublin, Ireland, August 23-27, 2021. pages 576-580, IEEE, 2021. [doi]

@inproceedings{TranDV21,
  title = {WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information},
  author = {An Tran and Konstantinos Drossos and Tuomas Virtanen},
  year = {2021},
  doi = {10.23919/EUSIPCO54536.2021.9616340},
  url = {https://doi.org/10.23919/EUSIPCO54536.2021.9616340},
  researchr = {https://researchr.org/publication/TranDV21},
  cites = {0},
  citedby = {0},
  pages = {576-580},
  booktitle = {29th European Signal Processing Conference, EUSIPCO 2021, Dublin, Ireland, August 23-27, 2021},
  publisher = {IEEE},
  isbn = {978-9-0827-9706-0},
}