Agreement is overrated: A plea for correlation to assess human evaluation reliability

Jacopo Amidei, Paul Piwek, Alistair Willis. Agreement is overrated: A plea for correlation to assess human evaluation reliability. In Kees van Deemter, Chenghua Lin, Hiroya Takamura, editors, Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Tokyo, Japan, October 29 - November 1, 2019. pages 344-354, Association for Computational Linguistics, 2019. [doi]

@inproceedings{AmideiPW19,
  title = {Agreement is overrated: A plea for correlation to assess human evaluation reliability},
  author = {Jacopo Amidei and Paul Piwek and Alistair Willis},
  year = {2019},
  url = {https://aclweb.org/anthology/papers/W/W19/W19-8642/},
  researchr = {https://researchr.org/publication/AmideiPW19},
  cites = {0},
  citedby = {0},
  pages = {344-354},
  booktitle = {Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Tokyo, Japan, October 29 - November 1, 2019},
  editor = {Kees van Deemter and Chenghua Lin and Hiroya Takamura},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-950737-94-9},
}