Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C. H. Hoi. Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation. Computer Speech & Language, 63:101095, 2020. [doi]

@article{LeSCH20,
  title = {Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation},
  author = {Hung Le and Doyen Sahoo and Nancy F. Chen and Steven C. H. Hoi},
  year = {2020},
  doi = {10.1016/j.csl.2020.101095},
  url = {https://doi.org/10.1016/j.csl.2020.101095},
  researchr = {https://researchr.org/publication/LeSCH20},
  cites = {0},
  citedby = {0},
  journal = {Computer Speech & Language},
  volume = {63},
  pages = {101095},
}