Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac 0001, Lisa Anne Hendricks. Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Authors

Maribeth Rauh

This author has not been identified. Look up 'Maribeth Rauh' in Google

John Mellor

This author has not been identified. Look up 'John Mellor' in Google

Jonathan Uesato

This author has not been identified. Look up 'Jonathan Uesato' in Google

Po-Sen Huang

This author has not been identified. Look up 'Po-Sen Huang' in Google

Johannes Welbl

This author has not been identified. Look up 'Johannes Welbl' in Google

Laura Weidinger

This author has not been identified. Look up 'Laura Weidinger' in Google

Sumanth Dathathri

This author has not been identified. Look up 'Sumanth Dathathri' in Google

Amelia Glaese

This author has not been identified. Look up 'Amelia Glaese' in Google

Geoffrey Irving

This author has not been identified. Look up 'Geoffrey Irving' in Google

Iason Gabriel

This author has not been identified. Look up 'Iason Gabriel' in Google

William Isaac 0001

This author has not been identified. Look up 'William Isaac 0001' in Google

Lisa Anne Hendricks

This author has not been identified. Look up 'Lisa Anne Hendricks' in Google