Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Steffen Eger, Yang Gao 0021, Maxime Peyrard, Wei Zhao 0033, Eduard H. Hovy, editors, Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020. Association for Computational Linguistics, 2020. [doi]

Conference: eval4nlp2020

Abstract is missing.

Truth or Error? Towards systematic analysis of factual errors in abstractive summariesKlaus-Michael Lux, Maya Sappelli, Martha A. Larson. 1-10 [doi]

Fill in the BLANC: Human-free quality estimation of document summariesOleg V. Vasilyev 0001, Vedant Dharnidharka, John Bohannon. 11-20 [doi]

Item Response Theory for Efficient Human Evaluation of ChatbotsJoão Sedoc, Lyle H. Ungar. 21-33 [doi]

ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERTHwanhee Lee, Seunghyun Yoon 0002, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung. 34-39 [doi]

BLEU Neighbors: A Reference-less Approach to Automatic EvaluationKawin Ethayarajh, Dorsa Sadigh. 40-50 [doi]

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover DistanceXi Chen 0071, Nan Ding 0002, Tomer Levinboim, Radu Soricut. 51-59 [doi]

On the Evaluation of Machine Translation n-best ListsJacob Bremerman, Huda Khayrallah, Douglas W. Oard, Matt Post. 60-68 [doi]

Artemis: A Novel Annotation Methodology for Indicative Single Document SummarizationRahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan Zhiboedov, Kieran McDonald. 69-78 [doi]

Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification ModelsReda Yacouby, Dustin Axman. 79-91 [doi]

A survey on Recognizing Textual Entailment as an NLP EvaluationAdam Poliak. 92-109 [doi]

Grammaticality and Language ModellingJingcheng Niu, Gerald Penn. 110-119 [doi]

One of these words is not like the other: a reproduction of outlier identification using non-contextual word representationsJesper Brink Andersen, Mikkel Bak Bertelsen, Mikkel Hørby Schou, Manuel R. Ciosici, Ira Assent. 120-130 [doi]

Are Some Words Worth More than Others?Shiran Dudy, Steven Bedrick. 131-142 [doi]

On Aligning OpenIE Extractions with Knowledge Bases: A Case StudyKiril Gashteovski, Rainer Gemulla, Bhushan Kotnis, Sven Hertling, Christian Meilicke. 143-154 [doi]

ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance EvaluationHanna Wecker, Annemarie Friedrich, Heike Adel. 155-163 [doi]

Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic EvaluationNeslihan Iskender, Tim Polzehl, Sebastian Möller 0001. 164-175 [doi]

Evaluating Word Embeddings on Low-Resource LanguagesNathan Stringham, Mike Izbicki. 176-186 [doi]

runs on WebDSL