ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Lu Yan, Zhuo Zhang 0002, Guanhong Tao 0001, Kaiyuan Zhang 0002, Xuan Chen, Guangyu Shen, Xiangyu Zhang. ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

Authors

Lu Yan

This author has not been identified. Look up 'Lu Yan' in Google

Zhuo Zhang 0002

This author has not been identified. Look up 'Zhuo Zhang 0002' in Google

Guanhong Tao 0001

This author has not been identified. Look up 'Guanhong Tao 0001' in Google

Kaiyuan Zhang 0002

This author has not been identified. Look up 'Kaiyuan Zhang 0002' in Google

Xuan Chen

This author has not been identified. Look up 'Xuan Chen' in Google

Guangyu Shen

This author has not been identified. Look up 'Guangyu Shen' in Google

Xiangyu Zhang

This author has not been identified. Look up 'Xiangyu Zhang' in Google