OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

Authors

Hugo Laurençon

This author has not been identified. Look up 'Hugo Laurençon' in Google

Lucile Saulnier

This author has not been identified. Look up 'Lucile Saulnier' in Google

Léo Tronchon

This author has not been identified. Look up 'Léo Tronchon' in Google

Stas Bekman

This author has not been identified. Look up 'Stas Bekman' in Google

Amanpreet Singh

This author has not been identified. Look up 'Amanpreet Singh' in Google

Anton Lozhkov

This author has not been identified. Look up 'Anton Lozhkov' in Google

Thomas Wang

This author has not been identified. Look up 'Thomas Wang' in Google

Siddharth Karamcheti

This author has not been identified. Look up 'Siddharth Karamcheti' in Google

Alexander M. Rush

This author has not been identified. Look up 'Alexander M. Rush' in Google

Douwe Kiela

This author has not been identified. Look up 'Douwe Kiela' in Google

Matthieu Cord

This author has not been identified. Look up 'Matthieu Cord' in Google

Victor Sanh

This author has not been identified. Look up 'Victor Sanh' in Google