Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma. Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR. In Hanseok Ko, John H. L. Hansen, editors, Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022. pages 1016-1020, ISCA, 2022. [doi]

@inproceedings{WeiZSXM22-0,
  title = {Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR},
  author = {Kun Wei and Yike Zhang and Sining Sun and Lei Xie and Long Ma},
  year = {2022},
  doi = {10.21437/Interspeech.2022-10326},
  url = {https://doi.org/10.21437/Interspeech.2022-10326},
  researchr = {https://researchr.org/publication/WeiZSXM22-0},
  cites = {0},
  citedby = {0},
  pages = {1016-1020},
  booktitle = {Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022},
  editor = {Hanseok Ko and John H. L. Hansen},
  publisher = {ISCA},
}