Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-ming Xu, Qi Chen, Lei Wang, Lingqiao Liu. Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning. In Toby Walsh, Julie Shah, Zico Kolter, editors, AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA. pages 8851-8859, AAAI Press, 2025. [doi]

Abstract

Abstract is missing.