PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

Alexandre Piché, Ehsan Kamalloo, Rafael Pardinas, Xiaoyin Chen, Dzmitry Bahdanau. PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation. Trans. Mach. Learn. Res., 2026, 2026. [doi]

Abstract

Abstract is missing.