Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information

Zekun Yang, Jiajun He, Tomoki Toda. Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information. In Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024, Macau, December 3-6, 2024. pages 1-6, IEEE, 2024. [doi]

Authors

Zekun Yang

This author has not been identified. Look up 'Zekun Yang' in Google

Jiajun He

This author has not been identified. Look up 'Jiajun He' in Google

Tomoki Toda

This author has not been identified. Look up 'Tomoki Toda' in Google