Abstract is missing.
- Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to ProcessingLucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek, Pierre Zweigenbaum. 1-10 [doi]
- Validating Label Consistency in NER Data AnnotationQingkai Zeng 0001, Mengxia Yu, Wenhao Yu 0002, Tianwen Jiang, Meng Jiang 0001. 11-15 [doi]
- How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis TaskUrja Khurana, Eric T. Nalisnick, Antske Fokkens. 16-31 [doi]
- StoryDB: Broad Multi-language Narrative DatasetAlexey Tikhonov, Igor Samenko, Ivan P. Yamshchikov. 32-39 [doi]
- SeqScore: Addressing Barriers to Reproducible Named Entity Recognition EvaluationChester Palen-Michel, Nolan Holley, Constantine Lignos. 40-50 [doi]
- Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural GeneratorNicolas Garneau, Luc Lamontagne. 51-61 [doi]
- Evaluation of Unsupervised Automatic Readability Assessors Using Rank CorrelationsYo Ehara. 62-72 [doi]
- Testing Cross-Database Semantic Parsers With Canonical UtterancesHeather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, Xi Victoria Lin. 73-83 [doi]
- Writing Style Author Embedding EvaluationEnzo Terreau, Antoine Gourru, Julien Velcin. 84-93 [doi]
- ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched EmbeddingsOleg V. Vasilyev 0001, John Bohannon. 94-103 [doi]
- Statistically Significant Detection of Semantic Shifts using Contextual Word EmbeddingsYang Liu 0254, Alan Medlar, Dorota Glowacka. 104-113 [doi]
- Referenceless Parsing-Based Evaluation of AMR-to-English GenerationEmma Manning, Nathan Schneider 0001. 114-122 [doi]
- MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG EvaluationAyush Garg 0001, Sammed S. Kagi, Vivek Srivastava, Mayank Singh 0001. 123-132 [doi]
- IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared TaskMarcos V. Treviso, Nuno Miguel Guerreiro, Ricardo Rei, André F. T. Martins. 133-145 [doi]
- Error Identification for Machine Translation with Metric Embedding and AttentionRaphael Rubino, Atsushi Fujita, Benjamin Marie. 146-156 [doi]
- Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching MetricsChristoph Wolfgang Leiter. 157-164 [doi]
- The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and ResultsMarina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao 0033, Steffen Eger, Yang Gao 0021. 165-178 [doi]
- Developing a Benchmark for Reducing Data Bias in Authorship AttributionBenjamin Murauer, Günther Specht. 179-188 [doi]
- Error-Sensitive Evaluation for Ordinal Target VariablesDavid Chen, Maury Courtland, Adam Faulkner, Aysu Ezen-Can. 189-199 [doi]
- HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish TextVivek Srivastava, Mayank Singh 0001. 200-208 [doi]
- What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLPOskar Wysocki, Malina Florea, Dónal Landers, André Freitas. 209-229 [doi]
- The UMD Submission to the Explainable MT Quality Estimation Shared Task: Combining Explanation Models with Sequence LabelingTasnim Kabir, Marine Carpuat. 230-237 [doi]
- Explaining Errors in Machine Translation with Absolute Gradient EnsemblesMelda Eksi, Erik Gelbing, Jonathan Stieber, Chi Viet Vu. 238-249 [doi]
- Explainable Quality Estimation: CUNI Eval4NLP SubmissionPeter Polák, Muskaan Singh, Ondrej Bojar. 250-255 [doi]