HARM: Learning Hate-Aware Reward Model for Evaluating Natural Language Explanations of Offensive Content

Lorenzo Puppi Vecchi, Alceu de Souza Britto Jr., Emerson Cabrera Paraiso, Rafael M. O. Cruz. HARM: Learning Hate-Aware Reward Model for Evaluating Natural Language Explanations of Offensive Content. In Vera Demberg, Kentaro Inui, LluĂ­s Marquez, editors, Findings of the Association for Computational Linguistics: EACL 2026, Rabat, Morocco, March 24-29, 2026. pages 4393-4431, Association for Computational Linguistics, 2026. [doi]

Authors

Lorenzo Puppi Vecchi

This author has not been identified. Look up 'Lorenzo Puppi Vecchi' in Google

Alceu de Souza Britto Jr.

This author has not been identified. Look up 'Alceu de Souza Britto Jr.' in Google

Emerson Cabrera Paraiso

This author has not been identified. Look up 'Emerson Cabrera Paraiso' in Google

Rafael M. O. Cruz

This author has not been identified. Look up 'Rafael M. O. Cruz' in Google