A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

Vladimir Iashin, Esa Rahtu. A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer. In 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK, September 7-10, 2020. BMVA Press, 2020. [doi]

Authors

Vladimir Iashin

This author has not been identified. Look up 'Vladimir Iashin' in Google

Esa Rahtu

This author has not been identified. Look up 'Esa Rahtu' in Google