Multimodal fusion with vision-language-action models for robotic manipulation: A systematic review

Muhayy Ud Din, Waseem Akram 0001, Lyes Saad Saoud, Jan Rosell, Irfan Hussain. Multimodal fusion with vision-language-action models for robotic manipulation: A systematic review. Information Fusion, 129:104062, 2026. [doi]

Authors

Muhayy Ud Din

This author has not been identified. Look up 'Muhayy Ud Din' in Google

Waseem Akram 0001

This author has not been identified. Look up 'Waseem Akram 0001' in Google

Lyes Saad Saoud

This author has not been identified. Look up 'Lyes Saad Saoud' in Google

Jan Rosell

This author has not been identified. Look up 'Jan Rosell' in Google

Irfan Hussain

This author has not been identified. Look up 'Irfan Hussain' in Google