Direct Preference-based Policy Optimization without Reward Modeling

researchr

You are not signed in
Sign in
Sign up

Gaon An, Junhyeok Lee, Xingdong Zuo, Norio Kosaka, Kyung Min Kim, Hyun Oh Song. Direct Preference-based Policy Optimization without Reward Modeling. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

@inproceedings{AnLZKKS23,
  title = {Direct Preference-based Policy Optimization without Reward Modeling},
  author = {Gaon An and Junhyeok Lee and Xingdong Zuo and Norio Kosaka and Kyung Min Kim and Hyun Oh Song},
  year = {2023},
  url = {http://papers.nips.cc/paper_files/paper/2023/hash/de8bd6b2b01cfa788e63f62e5b9a99b9-Abstract-Conference.html},
  researchr = {https://researchr.org/publication/AnLZKKS23},
  cites = {0},
  citedby = {0},
  booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
  editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}

External Links

Cite Key

Statistics

PDF

Researchr

Direct Preference-based Policy Optimization without Reward Modeling