Yonas Atinafu, Robin Cohen. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents. In 42nd IEEE International Conference on Data Engineering, ICDE 2026 - Workshops, Montreal, QC, Canada, May 4-8, 2026. pages 27-34, IEEE, 2026. [doi]
Abstract is missing.