Abstract is missing.
- LLMJudge: LLMs for Relevance JudgmentsHossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra 0001, Paul Thomas 0001, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli. 1-3 [doi]
- The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based ApproachesBhashithe Abeysinghe, Ruhan Circi. 4-18 [doi]
- Exploring Large Language Models for Relevance Judgments in TetunGabriel de Jesus, Sérgio Sobral Nunes. 19-30 [doi]
- EXAM++: LLM-based Answerability Metrics for IR EvaluationNaghmeh Farzi, Laura Dietz. 31-50 [doi]
- A Novel Evaluation Framework for Image2Text GenerationJia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Alessio M. Pacces, Evangelos Kanoulas. 51-65 [doi]
- Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User SatisfactionHyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim. 66-91 [doi]
- Evaluating RAG-Fusion with RAGElo: an Automated Elo-based FrameworkZackary Rackauckas, Arthur Câmara, Jakub Zavrel. 92-112 [doi]
- Toward Automatic Relevance Judgment using Vision-Language Models for Image-Text Retrieval EvaluationJheng-Hong Yang, Jimmy Lin. 113-123 [doi]