Train a Unified Multimodal Data Quality Classifier with Synthetic Data

Weizhi Wang, Rongmei Lin, Shiyang Li, Colin Lockard, Ritesh Sarkhel, Sanket Lokegaonkar, Jingbo Shang, Xifeng Yan, Nasser Zalmout, Xian Li. Train a Unified Multimodal Data Quality Classifier with Synthetic Data. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 1972-1986, Association for Computational Linguistics, 2025. [doi]

Authors

Weizhi Wang

This author has not been identified. Look up 'Weizhi Wang' in Google

Rongmei Lin

This author has not been identified. Look up 'Rongmei Lin' in Google

Shiyang Li

This author has not been identified. Look up 'Shiyang Li' in Google

Colin Lockard

This author has not been identified. Look up 'Colin Lockard' in Google

Ritesh Sarkhel

This author has not been identified. Look up 'Ritesh Sarkhel' in Google

Sanket Lokegaonkar

This author has not been identified. Look up 'Sanket Lokegaonkar' in Google

Jingbo Shang

This author has not been identified. Look up 'Jingbo Shang' in Google

Xifeng Yan

This author has not been identified. Look up 'Xifeng Yan' in Google

Nasser Zalmout

This author has not been identified. Look up 'Nasser Zalmout' in Google

Xian Li

This author has not been identified. Look up 'Xian Li' in Google