Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information

Zekun Yang, Jiajun He, Tomoki Toda. Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information. In Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024, Macau, December 3-6, 2024. pages 1-6, IEEE, 2024. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.