Red Teaming Language Models with Language Models

Ethan Perez, Saffron Huang, H. Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, Geoffrey Irving. Red Teaming Language Models with Language Models. In Yoav Goldberg, Zornitsa Kozareva, Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11. pages 3419-3448, Association for Computational Linguistics, 2022. [doi]

Authors

Ethan Perez

This author has not been identified. Look up 'Ethan Perez' in Google

Saffron Huang

This author has not been identified. Look up 'Saffron Huang' in Google

H. Francis Song

This author has not been identified. Look up 'H. Francis Song' in Google

Trevor Cai

This author has not been identified. Look up 'Trevor Cai' in Google

Roman Ring

This author has not been identified. Look up 'Roman Ring' in Google

John Aslanides

This author has not been identified. Look up 'John Aslanides' in Google

Amelia Glaese

This author has not been identified. Look up 'Amelia Glaese' in Google

Nat McAleese

This author has not been identified. Look up 'Nat McAleese' in Google

Geoffrey Irving

This author has not been identified. Look up 'Geoffrey Irving' in Google