Abstract is missing.
- Large Generative Models Meet Multimodal Video IntelligenceMike Zheng Shou. 1 [doi]
- Unlocking Multimedia Capabilities of Gigantic Pretrained Language ModelsBoyang Li. 3-4 [doi]
- Multi-Modal Generative AI with Foundation ModelsZiwei Liu 0002. 5 [doi]
- NeurSEG: A Segment Driven Deep Neural Model for Nested Named Entity RecognitionZheng Wang, Fei Li, Cheng Long. 7-14 [doi]
- SAT: Self-Attention Control for Diffusion Models TrainingJing Huang, Tianyi Zhang, Wei Shi. 15-22 [doi]
- Multimodal Data Augmentation for Image Captioning using Diffusion ModelsChangrong Xiao, Sean Xin Xu, Kunpeng Zhang. 23-33 [doi]
- ImEW: A Framework for Editing Image in the WildTasnim Mohiuddin, Tianyi Zhang, Maowen Nie, Jing Huang, Qianqian Chen, Wei Shi. 34-44 [doi]
- CGSMP: Controllable Generative Summarization via Multimodal PromptQian Yong, Jueqi Wei, Yiren Zhang, XiLun Zhang, Chao Wei, Simiao Chen, Yunhe Li, Cheng Ye, Bing Huang, Hao Wang. 45-50 [doi]
- Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information RetrievalFederico Rossetto, Jeffrey Dalton 0001, Roderick Murray-Smith. 51-59 [doi]
- Subsampling of Frequent Words in Text for Pre-training a Vision-Language ModelMingliang Liang, Martha A. Larson. 61-67 [doi]
- Fashion-GPT: Integrating LLMs with Fashion Retrieval SystemQianqian Chen, Tianyi Zhang, Maowen Nie, Zheng Wang, Shihao Xu, Wei Shi, Zhao Cao. 69-78 [doi]