Jiedong Zhuang, Lu Lu, Ming Dai, Rui Hu, Jian Chen, Qiang Liu, Haoji Hu. ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming. In Toby Walsh, Julie Shah, Zico Kolter, editors, AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA. pages 11049-11057, AAAI Press, 2025. [doi]
Abstract is missing.