Abstract is missing.
- Cross-Modal Mapping for Generalized Zero-Shot Learning by Soft-LabelingShabnam Daghaghi, Anshumali Shrivastava, Tharun Medini. [doi]
- Learning Question-Guided Video Representation for Multi-Turn Video Question AnsweringGuan-Lin Chao, Abhinav Rastogi, Semih Yavuz, Dilek Hakkani-Tür, Jindong Chen, Ian Lane. [doi]
- A Simple Baseline for Visual Commonsense ReasoningJingxiang Lin, Unnat Jain, Alexander G. Schwing. [doi]
- A perspective on multi-agent communication for information fusionHomagni Saha, Vijay Venkataraman, Alberto Speranzon, Soumik Sarkar. [doi]
- Self-Educated Language Agent with Hindsight Experience Replay for Instruction FollowingGeoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin. [doi]
- General Evaluation for Instruction Conditioned Navigation using Dynamic Time WarpingGabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge. [doi]
- Shaping Visual Representations with Language for Few-shot ClassificationJesse Mu, Percy Liang, Noah D. Goodman. [doi]
- Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation LearningKhanh Nguyen, Hal Daumé III. [doi]
- On Agreements in Visual UnderstandingYassine Mrabet, Dina Demner-Fushman. [doi]
- Community size effect in artificial learning systemsOlivier Tieleman, Angeliki Lazaridou, Shibl Mourad, Charles Blundell, Doina Precup. [doi]
- Deep compositional robotic planners that follow natural language commandsYen-ling Kuo, Boris Katz, Andrei Barbu.
- A Comprehensive Analysis of Semantic Compositionality in Text-to-Image GenerationChihiro Fujiyama, Ichiro Kobayashi. [doi]
- Visual Dialog for Radiology: Data Curation and FirstStepsOlga Kovaleva, Chaitanya Shivade, Satyananda Kashyap, Karina Kanjaria, Adam Coy, Deddeh Ballah, Yufan Guo, Joy T. Wu, Alexandros Karargyris, David Beymer, Anna Rumshisky, Vandana Mukherjee. [doi]
- Not All Actions Are Equal: Learning to Stop in Language-Grounded Urban NavigationJiannan Xiang, Xin Wang 0061, William Yang Wang. [doi]
- Learning from Observation-Only Demonstration for Task-Oriented Language Grounding via Self-ExaminationTsu-Jui Fu, Yuta Tsuboi, Sosuke Kobayashi, Yuta Kikuchi. [doi]
- Supervised Multimodal Bitransformers for Classifying Images and TextDouwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine. [doi]
- Analyzing Compositionality in Visual Question AnsweringSanjay Subramanian, Sameer Singh 0001, Matt Gardner 0001. [doi]
- CLOSURE: Assessing Systematic Generalization of CLEVR ModelsHarm de Vries, Dzmitry Bahdanau, Shikhar Murty, Aaron C. Courville, Philippe Beaudoin. [doi]
- Hidden State Guidance: Improving Image Captioning Using an Image Conditioned AutoencoderJialin Wu, Raymond J. Mooney. [doi]
- Induced Attention Invariance: Defending VQA Models against Adversarial AttacksVasu Sharma, Ankita Kalra, Louis-Philippe Morency. [doi]
- Learning Language from VisionCandace Ross, Cheahuychou Mao, Boris Katz, Andrei Barbu. [doi]
- Contextual Grounding of Natural Language Entities in ImagesFarley Lai, Ning Xie, Derek Doran, Asim Kadav. [doi]
- Commonsense and Semantic-Guided Navigation through Language in Embodied EnvironmentDian Yu, Chandra Khatri, Alexandros Papangelis, Andrea Madotto, Mahdi Namazifar, Joost Huizinga, Adrien Ecoffet, Huaixiu Zheng, Piero Molino, Jeff Clune, Zhou Yu, Kenji Sagae, Gökhan Tür.
- Multimodal Generative Learning Utilizing Jensen-Shannon-DivergenceThomas Sutter, Imant Daunhawer, Julia Vogt. [doi]
- Visually Grounded Video Reasoning in Selective Attention MemoryT. S. Jayram, Vincent Albouy, Tomasz Kornuta, Emre Sevgen, Ahmet S. Ozcan. [doi]
- Structural and functional learning for learning language useAngeliki Lazaridou, Anna Potapenko, Olivier Tieleman. [doi]
- Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware DialogShachi H. Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman. [doi]
- Can adversarial training learn image captioning ?Jean-Benoit Delbrouck. [doi]
- Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal LearningNicolas Lair, Cédric Colas, Rémy Portelas, Jean-Michel Dussoux, Peter F. Dominey, Pierre-Yves Oudeyer. [doi]
- Situated Grounding Facilitates Multimodal Concept Learning for AINikhil Krishnaswamy, James Pustejovsky. [doi]
- What is needed for simple spatial language capabilities in VQA?Alexander Kuhnle, Ann A. Copestake. [doi]
- VideoNavQA: Bridging the Gap between Visual and Embodied Question AnsweringCatalina Cangea, Eugene Belilovsky, Pietro Liò, Aaron C. Courville. [doi]
- Natural Language Grounded Multitask NavigationXin Wang 0061, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi. [doi]
- Recurrent Instance Segmentation using Sequences of Referring ExpressionsAlba Maria Hererra-Palacio, Carles Ventura, Carina Silberer, Ionut-Teodor Sorodoc, Gemma Boleda, Xavier Giró i Nieto. [doi]
- Modulated Self-attention Convolutional Network for VQAJean-Benoit Delbrouck. [doi]