MLDataForge: Accelerating Large-Scale Dataset Preprocessing and Access for Multimodal Foundation Model Training

Andrea Blasi Núñez, Lukas Paul Achatius Galke, Peter Schneider-Kamp. MLDataForge: Accelerating Large-Scale Dataset Preprocessing and Access for Multimodal Foundation Model Training. In Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov, editors, Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, RANLP 2025, Varna, Bulgaria, September 8-10, 2025. pages 175-183, INCOMA Ltd., Shoumen, Bulgaria, 2025. [doi]

Abstract

Abstract is missing.