Red Teaming Language Models with Language Models

Ethan Perez, Saffron Huang, H. Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, Geoffrey Irving. Red Teaming Language Models with Language Models. In Yoav Goldberg, Zornitsa Kozareva, Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11. pages 3419-3448, Association for Computational Linguistics, 2022. [doi]

@inproceedings{PerezHSCRAGMI22,
  title = {Red Teaming Language Models with Language Models},
  author = {Ethan Perez and Saffron Huang and H. Francis Song and Trevor Cai and Roman Ring and John Aslanides and Amelia Glaese and Nat McAleese and Geoffrey Irving},
  year = {2022},
  url = {https://aclanthology.org/2022.emnlp-main.225},
  researchr = {https://researchr.org/publication/PerezHSCRAGMI22},
  cites = {0},
  citedby = {0},
  pages = {3419-3448},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11},
  editor = {Yoav Goldberg and Zornitsa Kozareva and Yue Zhang},
  publisher = {Association for Computational Linguistics},
}