Multimodal Attention Fusion for Target Speaker Extraction

Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki. Multimodal Attention Fusion for Target Speaker Extraction. In IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021. pages 778-784, IEEE, 2021. [doi]

@inproceedings{SatoOKDNA21,
  title = {Multimodal Attention Fusion for Target Speaker Extraction},
  author = {Hiroshi Sato and Tsubasa Ochiai and Keisuke Kinoshita and Marc Delcroix and Tomohiro Nakatani and Shoko Araki},
  year = {2021},
  doi = {10.1109/SLT48900.2021.9383539},
  url = {https://doi.org/10.1109/SLT48900.2021.9383539},
  researchr = {https://researchr.org/publication/SatoOKDNA21},
  cites = {0},
  citedby = {0},
  pages = {778-784},
  booktitle = {IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021},
  publisher = {IEEE},
  isbn = {978-1-7281-7066-4},
}