SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev. SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 11275-11288, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.