Abstract is missing.
- WRF: Weighted Rouge-F1 Metric for Entity RecognitionLukas Weber, Krishnan Jothi Ramalingam, Matthias Beyer, Axel Zimmermann 0005. 1-11 [doi]
- Assessing Distractors in Multiple-Choice TestsVatsal Raina, Adian Liusie, Mark J. F. Gales. 12-22 [doi]
- Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across LanguagesYixuan Wang, Qingyan Chen, Duygu Ataman. 23-31 [doi]
- EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social MediaZahra Kolagar, Sebastian Steindl, Alessandra Zarcone. 32-48 [doi]
- Zero-shot Probing of Pretrained Language Models for Geography KnowledgeNitin Ramrakhiyani, Vasudeva Varma, Girish K. Palshikar, Sachin Pawar. 49-61 [doi]
- Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-EndYanran Chen, Steffen Eger. 62-84 [doi]
- Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log InformationJeremy Block, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony, Bonnie J. Dorr. 85-99 [doi]
- Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language ContentSavita Bhat, Vasudeva Varma. 100-107 [doi]
- Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language ModelsJad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea, Fakhri Karray. 108-116 [doi]
- The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable MetricsChristoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao 0021, Rotem Dror, Steffen Eger. 117-138 [doi]
- HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared TaskRui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang, Tiejun Zhao. 139-148 [doi]
- Understanding Large Language Model Based Metrics for Text SummarizationAbhishek Pradhan, Ketan Kumar Todi. 149-155 [doi]
- LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics TaskPavan Baswani, Ananya Mukherjee, Manish Shrivastava 0001. 156-163 [doi]
- Which is better? Exploring Prompting Strategy For LLM-based MetricsJoonghoon Kim, Sangmin Lee, Seung-Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong, Pilsung Kang 0001. 164-183 [doi]
- Characterised LLMs Affect its Evaluation of Summary and TranslationYuan Lu, Yu-Ting Lin. 184-192 [doi]
- Reference-Free Summarization Evaluation with Large Language ModelsAbbas Akkasi, Kathleen C. Fraser, Majid Komeili. 193-201 [doi]
- Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared TaskNeema Kotonya, Saran Krishnasamy, Joel R. Tetreault, Alejandro Jaimes. 202-218 [doi]
- Exploring Prompting Large Language Models as Explainable MetricsGhazaleh Mahmoudi. 219-227 [doi]
- Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG EvaluationDaniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko, Steffen Eger. 228-234 [doi]