Truth Behind the Scene: Designing Evaluations Benchmarks to Assess LLMs' Task-Specific Understanding over Test-Taking Strategies - researchr publication

researchr

You are not signed in
Sign in
Sign up

Thao Pham. Truth Behind the Scene: Designing Evaluations Benchmarks to Assess LLMs' Task-Specific Understanding over Test-Taking Strategies. In Toby Walsh, Julie Shah, Zico Kolter, editors, AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA. pages 29596-29598, AAAI Press, 2025. [doi]

Abstract is missing.

runs on WebDSL