DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal. DORB: Dynamically Optimizing Multiple Rewards with Bandits. In Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. pages 7766-7780, Association for Computational Linguistics, 2020. [doi]

Abstract

Abstract is missing.