Tianhao Wu 0002, Weizhe Yuan, Olga Golovneva, Jing Xu 0014, Yuandong Tian, Jiantao Jiao, Jason E. Weston, Sainbayar Sukhbaatar. Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 11537-11554, Association for Computational Linguistics, 2025. [doi]
Abstract is missing.