researchr
explore
Tags
Journals
Conferences
Authors
Profiles
Groups
calendar
New Conferences
Events
Deadlines
search
search
You are not signed in
Sign in
Sign up
Links
Filter by Year
OR
AND
NOT
1
2020
2021
2022
2023
Filter by Tag
Filter by Author
[+]
OR
AND
NOT
1
Abbas Akkasi
Abhilash Budharapu
Abhishek Pradhan
Adam Faulkner
Adam Poliak
Adian Liusie
Alan Medlar
Alejandro Jaimes
Alejandro de la Vega
Alessandra Zarcone
John Bohannon
Juri Opitz
Kaori Abe
Kentaro Inui
Mayank Singh 0001
Oleg V. Vasilyev 0001
Steffen Eger
Vasudeva Varma
Vivek Srivastava
Yang Gao 0021
Filter by Top terms
[+]
OR
AND
NOT
1
2023
analysis
data
estimation
eval4nlp
evaluating
evaluation
explainable
language
large
metrics
models
nlp
shared
submission
summarization
task
text
translation
word
Eval4NLP (eval4nlp)
Editions
Publications
Viewing Publication 1 - 75 from 75
2023
Reference-Free Summarization Evaluation with Large Language Models
Abbas Akkasi
,
Kathleen C. Fraser
,
Majid Komeili
.
eval4nlp 2023
:
193-201
[doi]
LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics Task
Pavan Baswani
,
Ananya Mukherjee
,
Manish Shrivastava 0001
.
eval4nlp 2023
:
156-163
[doi]
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content
Savita Bhat
,
Vasudeva Varma
.
eval4nlp 2023
:
100-107
[doi]
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log Information
Jeremy Block
,
Yu-Peng Chen
,
Abhilash Budharapu
,
Lisa Anthony
,
Bonnie J. Dorr
.
eval4nlp 2023
:
85-99
[doi]
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End
Yanran Chen
,
Steffen Eger
.
eval4nlp 2023
:
62-84
[doi]
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2023, Bali, Indonesia, November 1, 2023
Daniel Deutsch
,
Rotem Dror
,
Steffen Eger
,
Yang Gao 0021
,
Christoph Leiter
,
Juri Opitz
,
Andreas Rücklé
, editors,
Association for Computational Linguistics,
2023.
[doi]
Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models
Jad Doughman
,
Shady Shehata
,
Leen Al Qadi
,
Youssef Nafea
,
Fakhri Karray
.
eval4nlp 2023
:
108-116
[doi]
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Joonghoon Kim
,
Sangmin Lee
,
Seung-Hun Han
,
Saeran Park
,
Jiyoon Lee
,
Kiyoon Jeong
,
Pilsung Kang 0001
.
eval4nlp 2023
:
164-183
[doi]
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media
Zahra Kolagar
,
Sebastian Steindl
,
Alessandra Zarcone
.
eval4nlp 2023
:
32-48
[doi]
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task
Neema Kotonya
,
Saran Krishnasamy
,
Joel R. Tetreault
,
Alejandro Jaimes
.
eval4nlp 2023
:
202-218
[doi]
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation
Daniil Larionov
,
Vasiliy Viskov
,
George Kokush
,
Alexander Panchenko
,
Steffen Eger
.
eval4nlp 2023
:
228-234
[doi]
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
,
Juri Opitz
,
Daniel Deutsch
,
Yang Gao 0021
,
Rotem Dror
,
Steffen Eger
.
eval4nlp 2023
:
117-138
[doi]
Characterised LLMs Affect its Evaluation of Summary and Translation
Yuan Lu
,
Yu-Ting Lin
.
eval4nlp 2023
:
184-192
[doi]
Exploring Prompting Large Language Models as Explainable Metrics
Ghazaleh Mahmoudi
.
eval4nlp 2023
:
219-227
[doi]
Understanding Large Language Model Based Metrics for Text Summarization
Abhishek Pradhan
,
Ketan Kumar Todi
.
eval4nlp 2023
:
149-155
[doi]
Assessing Distractors in Multiple-Choice Tests
Vatsal Raina
,
Adian Liusie
,
Mark J. F. Gales
.
eval4nlp 2023
:
12-22
[doi]
Zero-shot Probing of Pretrained Language Models for Geography Knowledge
Nitin Ramrakhiyani
,
Vasudeva Varma
,
Girish K. Palshikar
,
Sachin Pawar
.
eval4nlp 2023
:
49-61
[doi]
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages
Yixuan Wang
,
Qingyan Chen
,
Duygu Ataman
.
eval4nlp 2023
:
23-31
[doi]
WRF: Weighted Rouge-F1 Metric for Entity Recognition
Lukas Weber
,
Krishnan Jothi Ramalingam
,
Matthias Beyer
,
Axel Zimmermann 0005
.
eval4nlp 2023
:
1-11
[doi]
HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared Task
Rui Zhang
,
Fuhai Song
,
Hui Huang
,
Jinghao Yuan
,
Muyun Yang
,
Tiejun Zhao
.
eval4nlp 2023
:
139-148
[doi]
2022
Why is sentence similarity benchmark not predictive of application-oriented task performance?
Kaori Abe
,
Sho Yokoi
,
Tomoyuki Kajiwara
,
Kentaro Inui
.
eval4nlp 2022
:
70-87
[doi]
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset
Guanyi Chen
,
Fahime Same
,
Kees van Deemter
.
eval4nlp 2022
:
103-114
[doi]
GLARE: Generative Left-to-right AdversaRial Examples
Ryan Chi
,
Nathan Kim
,
Patrick Liu
,
Zander Lack
,
Ethan A. Chi
.
eval4nlp 2022
:
44-50
[doi]
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2022, Online, November 20, 2022
Daniel Deutsch
,
Can Udomcharoenchaikit
,
Juri Opitz
,
Yang Gao 0021
,
Marina Fomicheva
,
Steffen Eger
, editors,
Association for Computational Linguistics,
2022.
[doi]
A Comparative Analysis of Stance Detection Approaches and Datasets
Parush Gera
,
Tempestt J. Neal
.
eval4nlp 2022
:
58-69
[doi]
A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging
Shohei Higashiyama
,
Masao Ideuchi
,
Masao Utiyama
,
Yoshiaki Oida
,
Eiichiro Sumita
.
eval4nlp 2022
:
1-10
[doi]
From COMET to COMES - Can Summary Evaluation Benefit from Translation Evaluation?
Mateusz Krubi'nski
,
Pavel Pecina
.
eval4nlp 2022
:
21-31
[doi]
Chat Translation Error Detection for Assisting Cross-lingual Communications
Yunmeng Li
,
Jun Suzuki
,
Makoto Morishita
,
Kaori Abe
,
Ryoko Tokuhisa
,
Ana Brassard
,
Kentaro Inui
.
eval4nlp 2022
:
88-95
[doi]
Better Smatch = Better Parser? AMR evaluation is not so simple anymore
Juri Opitz
,
Anette Frank
.
eval4nlp 2022
:
32-43
[doi]
Evaluating the role of non-lexical markers in GPT-2's language modeling behavior
Roberta Rocca
,
Alejandro de la Vega
.
eval4nlp 2022
:
96-102
[doi]
Random Text Perturbations Work, but not Always
Zhengxiang Wang
.
eval4nlp 2022
:
51-57
[doi]
Assessing Resource-Performance Trade-off of Natural Language Models using Data Envelopment Analysis
Shohei Zhou
,
Alisha Zachariah
,
Devin Conathan
,
Jeffery Kline
.
eval4nlp 2022
:
11-20
[doi]
2021
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
Oleg V. Vasilyev 0001
,
John Bohannon
.
eval4nlp 2021
:
94-103
[doi]
Validating Label Consistency in NER Data Annotation
Qingkai Zeng 0001
,
Mengxia Yu
,
Wenhao Yu 0002
,
Tianwen Jiang
,
Meng Jiang 0001
.
eval4nlp 2021
:
11-15
[doi]
MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation
Ayush Garg 0001
,
Sammed S. Kagi
,
Vivek Srivastava
,
Mayank Singh 0001
.
eval4nlp 2021
:
123-132
[doi]
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2021, Punta Cana, Dominican Republic, November 10, 2021
Yang Gao 0021
,
Steffen Eger
,
Wei Zhao 0033
,
Piyawat Lertvittayakumjorn
,
Marina Fomicheva
, editors,
Association for Computational Linguistics,
2021.
[doi]
Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings
Yang Liu 0254
,
Alan Medlar
,
Dorota Glowacka
.
eval4nlp 2021
:
104-113
[doi]
Error-Sensitive Evaluation for Ordinal Target Variables
David Chen
,
Maury Courtland
,
Adam Faulkner
,
Aysu Ezen-Can
.
eval4nlp 2021
:
189-199
[doi]
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations
Yo Ehara
.
eval4nlp 2021
:
62-72
[doi]
Explaining Errors in Machine Translation with Absolute Gradient Ensembles
Melda Eksi
,
Erik Gelbing
,
Jonathan Stieber
,
Chi Viet Vu
.
eval4nlp 2021
:
238-249
[doi]
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
Marina Fomicheva
,
Piyawat Lertvittayakumjorn
,
Wei Zhao 0033
,
Steffen Eger
,
Yang Gao 0021
.
eval4nlp 2021
:
165-178
[doi]
Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator
Nicolas Garneau
,
Luc Lamontagne
.
eval4nlp 2021
:
51-61
[doi]
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing
Lucie Gianola
,
Hicham El Boukkouri
,
Cyril Grouin
,
Thomas Lavergne
,
Patrick Paroubek
,
Pierre Zweigenbaum
.
eval4nlp 2021
:
1-10
[doi]
The UMD Submission to the Explainable MT Quality Estimation Shared Task: Combining Explanation Models with Sequence Labeling
Tasnim Kabir
,
Marine Carpuat
.
eval4nlp 2021
:
230-237
[doi]
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Urja Khurana
,
Eric T. Nalisnick
,
Antske Fokkens
.
eval4nlp 2021
:
16-31
[doi]
Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics
Christoph Wolfgang Leiter
.
eval4nlp 2021
:
157-164
[doi]
Testing Cross-Database Semantic Parsers With Canonical Utterances
Heather Lent
,
Semih Yavuz
,
Tao Yu
,
Tong Niu
,
Yingbo Zhou
,
Dragomir Radev
,
Xi Victoria Lin
.
eval4nlp 2021
:
73-83
[doi]
Referenceless Parsing-Based Evaluation of AMR-to-English Generation
Emma Manning
,
Nathan Schneider 0001
.
eval4nlp 2021
:
114-122
[doi]
Developing a Benchmark for Reducing Data Bias in Authorship Attribution
Benjamin Murauer
,
Günther Specht
.
eval4nlp 2021
:
179-188
[doi]
SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation
Chester Palen-Michel
,
Nolan Holley
,
Constantine Lignos
.
eval4nlp 2021
:
40-50
[doi]
Explainable Quality Estimation: CUNI Eval4NLP Submission
Peter Polák
,
Muskaan Singh
,
Ondrej Bojar
.
eval4nlp 2021
:
250-255
[doi]
Error Identification for Machine Translation with Metric Embedding and Attention
Raphael Rubino
,
Atsushi Fujita
,
Benjamin Marie
.
eval4nlp 2021
:
146-156
[doi]
HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text
Vivek Srivastava
,
Mayank Singh 0001
.
eval4nlp 2021
:
200-208
[doi]
Writing Style Author Embedding Evaluation
Enzo Terreau
,
Antoine Gourru
,
Julien Velcin
.
eval4nlp 2021
:
84-93
[doi]
StoryDB: Broad Multi-language Narrative Dataset
Alexey Tikhonov
,
Igor Samenko
,
Ivan P. Yamshchikov
.
eval4nlp 2021
:
32-39
[doi]
IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task
Marcos V. Treviso
,
Nuno Miguel Guerreiro
,
Ricardo Rei
,
André F. T. Martins
.
eval4nlp 2021
:
133-145
[doi]
What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP
Oskar Wysocki
,
Malina Florea
,
Dónal Landers
,
André Freitas
.
eval4nlp 2021
:
209-229
[doi]
2020
Fill in the BLANC: Human-free quality estimation of document summaries
Oleg V. Vasilyev 0001
,
Vedant Dharnidharka
,
John Bohannon
.
eval4nlp 2020
:
11-20
[doi]
Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance
Xi Chen 0071
,
Nan Ding 0002
,
Tomer Levinboim
,
Radu Soricut
.
eval4nlp 2020
:
51-59
[doi]
One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations
Jesper Brink Andersen
,
Mikkel Bak Bertelsen
,
Mikkel Hørby Schou
,
Manuel R. Ciosici
,
Ira Assent
.
eval4nlp 2020
:
120-130
[doi]
On the Evaluation of Machine Translation n-best Lists
Jacob Bremerman
,
Huda Khayrallah
,
Douglas W. Oard
,
Matt Post
.
eval4nlp 2020
:
60-68
[doi]
Are Some Words Worth More than Others?
Shiran Dudy
,
Steven Bedrick
.
eval4nlp 2020
:
131-142
[doi]
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020
Steffen Eger
,
Yang Gao 0021
,
Maxime Peyrard
,
Wei Zhao 0033
,
Eduard H. Hovy
, editors,
Association for Computational Linguistics,
2020.
[doi]
BLEU Neighbors: A Reference-less Approach to Automatic Evaluation
Kawin Ethayarajh
,
Dorsa Sadigh
.
eval4nlp 2020
:
40-50
[doi]
On Aligning OpenIE Extractions with Knowledge Bases: A Case Study
Kiril Gashteovski
,
Rainer Gemulla
,
Bhushan Kotnis
,
Sven Hertling
,
Christian Meilicke
.
eval4nlp 2020
:
143-154
[doi]
Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation
Neslihan Iskender
,
Tim Polzehl
,
Sebastian Möller 0001
.
eval4nlp 2020
:
164-175
[doi]
Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization
Rahul Jha
,
Keping Bi
,
Yang Li
,
Mahdi Pakdaman
,
Asli Celikyilmaz
,
Ivan Zhiboedov
,
Kieran McDonald
.
eval4nlp 2020
:
69-78
[doi]
ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT
Hwanhee Lee
,
Seunghyun Yoon 0002
,
Franck Dernoncourt
,
Doo Soon Kim
,
Trung Bui
,
Kyomin Jung
.
eval4nlp 2020
:
34-39
[doi]
Truth or Error? Towards systematic analysis of factual errors in abstractive summaries
Klaus-Michael Lux
,
Maya Sappelli
,
Martha A. Larson
.
eval4nlp 2020
:
1-10
[doi]
Grammaticality and Language Modelling
Jingcheng Niu
,
Gerald Penn
.
eval4nlp 2020
:
110-119
[doi]
A survey on Recognizing Textual Entailment as an NLP Evaluation
Adam Poliak
.
eval4nlp 2020
:
92-109
[doi]
Item Response Theory for Efficient Human Evaluation of Chatbots
João Sedoc
,
Lyle H. Ungar
.
eval4nlp 2020
:
21-33
[doi]
Evaluating Word Embeddings on Low-Resource Languages
Nathan Stringham
,
Mike Izbicki
.
eval4nlp 2020
:
176-186
[doi]
ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation
Hanna Wecker
,
Annemarie Friedrich
,
Heike Adel
.
eval4nlp 2020
:
155-163
[doi]
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models
Reda Yacouby
,
Dustin Axman
.
eval4nlp 2020
:
79-91
[doi]
Sign in
or
sign up
to see more results.