The following publications are possibly variants of this publication:
- ResuFormer: Semantic Structure Understanding for Resumes via Multi-Modal Pre-trainingKaichun Yao, Jingshuai Zhang, Chuan Qin 0002, Xin Song, Peng Wang, Hengshu Zhu, Hui Xiong. icde 2023: 3154-3167 [doi]
- LayoutLMv2: Multi-modal Pre-training for Visually-rich Document UnderstandingYang Xu, Yiheng Xu, Tengchao Lv, Lei Cui 0001, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei A. F. FlorĂȘncio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. acl 2021: 2579-2591 [doi]
- MGDoc: Pre-training with Multi-granular Hierarchy for Document Image UnderstandingZilong Wang 0002, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu. emnlp 2022: 3984-3993 [doi]
- Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal TokensMinsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe 0001, Yong Man Ro. icassp 2024: 7970-7974 [doi]
- Fpcode: an Efficient Approach for Multi-Modal BiometricsLinLin Shen, Li Bai, Zhen Ji. ijprai, 25(2):273-286, 2011. [doi]