SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev. SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 11275-11288, Association for Computational Linguistics, 2023. [doi]

Authors

Yi Dong

This author has not been identified. Look up 'Yi Dong' in Google

Zhilin Wang

This author has not been identified. Look up 'Zhilin Wang' in Google

Makesh Narsimhan Sreedhar

This author has not been identified. Look up 'Makesh Narsimhan Sreedhar' in Google

Xianchao Wu

This author has not been identified. Look up 'Xianchao Wu' in Google

Oleksii Kuchaiev

This author has not been identified. Look up 'Oleksii Kuchaiev' in Google