Tong Zhao, Junping Du 0001, Zhe Xue, MeiYu Liang, Aijing Li, Xiaolong Meng, Dandan Liu. ST-VLM: A Spatial-to-Image Multimodal Spatial-Temporal Prediction Framework with Vision-Language Model. In Sven Koenig, Chad Jenkins, Matthew E. Taylor, editors, Fortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2026, Singapore, January 20-27, 2026. pages 16441-16449, AAAI Press, 2026. [doi]
Abstract is missing.