Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024) co-located with 10th International Conference on Online Publishing (SIGIR 2024), Washington D.C., USA, July 18, 2024 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Clemencia Siro, Mohammad Aliannejadi, Hossein A. Rahmani, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra 0001, Paul Thomas 0001, Emine Yilmaz, editors, Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024) co-located with 10th International Conference on Online Publishing (SIGIR 2024), Washington D.C., USA, July 18, 2024. Volume 3752 of CEUR Workshop Proceedings, CEUR-WS.org, 2024. [doi]

Conference: llm4eval2024

Abstract is missing.

LLMJudge: LLMs for Relevance JudgmentsHossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra 0001, Paul Thomas 0001, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli. 1-3 [doi]

The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based ApproachesBhashithe Abeysinghe, Ruhan Circi. 4-18 [doi]

Exploring Large Language Models for Relevance Judgments in TetunGabriel de Jesus, Sérgio Sobral Nunes. 19-30 [doi]

EXAM++: LLM-based Answerability Metrics for IR EvaluationNaghmeh Farzi, Laura Dietz. 31-50 [doi]

A Novel Evaluation Framework for Image2Text GenerationJia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Alessio M. Pacces, Evangelos Kanoulas. 51-65 [doi]

Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User SatisfactionHyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim. 66-91 [doi]

Evaluating RAG-Fusion with RAGElo: an Automated Elo-based FrameworkZackary Rackauckas, Arthur Câmara, Jakub Zavrel. 92-112 [doi]

Toward Automatic Relevance Judgment using Vision-Language Models for Image-Text Retrieval EvaluationJheng-Hong Yang, Jimmy Lin. 113-123 [doi]

runs on WebDSL