L-STAP: Learned Spatio-Temporal Adaptive Pooling for Video Captioning

Danny Francis, Benoit Huet. L-STAP: Learned Spatio-Temporal Adaptive Pooling for Video Captioning. In Raphaƫl Troncy, Jorma Laaksonen, Hamed R. Tavakoli, Lyndon J. B. Nixon, Vasileios Mezaris, editors, Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, AI4TV@MM 2019, Nice, France, October 21, 2019. pages 33-41, ACM, 2019. [doi]

Abstract

Abstract is missing.