DyCoT-RE: Chain-of-Thought-enhanced LLM reward engineering with dual-dynamic optimization for reinforcement learning

Xinning Zhu, Jinxin Du, Longfei Huang, Lunde Chen. DyCoT-RE: Chain-of-Thought-enhanced LLM reward engineering with dual-dynamic optimization for reinforcement learning. Neurocomputing, 695:133945, 2026. [doi]

Abstract

Abstract is missing.