The following publications are possibly variants of this publication:
- SpiritSight Agent: Advanced GUI Agent with One LookZhiyuan Huang, Ziming Cheng, Junting Pan, Zhaohui Hou, Mingjie Zhan. cvpr 2025: 29490-29500 [doi]
- CogAgent: A Visual Language Model for GUI AgentsWenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding 0004, Jie Tang 0001. cvpr 2024: 14281-14290 [doi]
- GUI Agents: A SurveyDang Nguyen, Jian Chen, Yu Wang, Gang Wu 0013, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia 0007, Xintong Li 0001, Jing Shi 0005, Hongjie Chen 0003, Viet Dac Lai, Zhouhang Xie, SungChul Kim, Ruiyi Zhang 0002, Tong Yu 0001, Md. Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon 0002, Lina Yao 0001, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zhou 0001, Ryan A. Rossi, Franck Dernoncourt. acl 2025: 22522-22538 [doi]