Nevan Wichers, Carson Denison, Ahmad Beirami. Gradient-Based Language Model Red Teaming. In Yvette Graham, Matthew Purver, editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Volume 1: Long Papers, St. Julian's, Malta, March 17-22, 2024. pages 2862-2881, Association for Computational Linguistics, 2024. [doi]
Abstract is missing.