Abstract is missing.
- Neural Mention DetectionJuntao Yu, Bernd Bohnet, Massimo Poesio. 1-10 [doi]
- A Cluster Ranking Model for Full Anaphora ResolutionJuntao Yu, Alexandra Uma, Massimo Poesio. 11-20 [doi]
- Mandarinograd: A Chinese Collection of Winograd SchemasTimothée Bernard, Ting Han. 21-26 [doi]
- On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation TasksAlexander Henlein, Alexander Mehler. 27-33 [doi]
- NoEl: An Annotated Corpus for Noun Ellipsis in EnglishPayal Khullar, Kushal Majmundar, Manish Shrivastava 0001. 34-43 [doi]
- An Annotated Dataset of Coreference in English LiteratureDavid Bamman, Olivia Lewke, Anya Mansoor. 44-54 [doi]
- GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in GermanJanis Pagel, Nils Reiter. 55-64 [doi]
- A Study on Entity Resolution for Email ConversationsParag Dakle, Takshak Desai, Dan I. Moldovan. 65-73 [doi]
- Model-based Annotation of CoreferenceRahul Aralikatte, Anders Søgaard. 74-79 [doi]
- French Coreference for Spoken and Written LanguageRodrigo Wilkens, Bruno Oberle, Frédéric Landragin, Amalia Todirascu. 80-89 [doi]
- Cross-lingual Zero Pronoun ResolutionAbdulrahman Aloraini, Massimo Poesio. 90-98 [doi]
- Exploiting Cross-Lingual Hints to Discover Event PronounsSharid Loáiciga, Christian Hardmeier, Asad Sayeed. 99-103 [doi]
- MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression GenerationScott Martin, Shivani Poddar, Kartikeya Upasani. 104-111 [doi]
- Affection Driven Neural Networks for Sentiment AnalysisRong Xiang, Yunfei Long, Mingyu Wan, Jinghang Gu, Qin Lu 0001, Chu-Ren Huang. 112-119 [doi]
- The Alice Datasets: fMRI & EEG Observations of Natural Language ComprehensionShohini Bhattasali, Jonathan Brennan, Wen-Ming Luh, Berta Franzluebbers, John Hale. 120-125 [doi]
- Modelling Narrative Elements in a Short Story: A Study on Annotation Schemes and GuidelinesElena Mikhalkova, Timofei Protasov, Polina Sokolova, Anastasiia Bashmakova, Anastasiia Drozdova. 126-132 [doi]
- Cortical Speech Databases For Deciphering the Articulatory CodeHarald Höge. 133-137 [doi]
- ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and AnnotationNora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer. 138-146 [doi]
- Linguistic, Kinematic and Gaze Information in Task Descriptions: The LKG-CorpusTim Reinboth, Stephanie Gross, Laura Bishop, Brigitte Krenn. 147-155 [doi]
- The ACQDIV Corpus Database and Aggregation PipelineAnna Jancso, Steven Moran, Sabine Stoll. 156-165 [doi]
- Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WNDidier Schwab, Pauline Trial, Céline Vaschalde, Loïc Vial, Emmanuelle Esperança-Rodier, Benjamin Lecouteux. 166-171 [doi]
- Orthographic Codes and the Neighborhood Effect: Lessons from Information TheoryStéphan Tulkens, Dominiek Sandra, Walter Daelemans. 172-181 [doi]
- Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity ContoursElma Kerz, Fabio Pruneri, Daniel Wiechmann, Yu Qiao, Marcus Ströbel. 182-188 [doi]
- Design of BCCWJ-EEG: Balanced Corpus with Human ElectroencephalographyYohei Oseki, Masayuki Asahara. 189-194 [doi]
- Using the RUPEX Multichannel Corpus in a Pilot fMRI Study on Speech DisfluenciesKaterina Smirnova, Nikolay Korotaev, Yana Panikratova, Irina Lebedeva, Ekaterina Pechenkova, Olga Fedorova. 195-203 [doi]
- Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second LanguageAomi Koyama, Tomoshige Kiyuna, Kenji Kobayashi, Mio Arai, Mamoru Komachi. 204-211 [doi]
- Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge ExtractionSangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-kyung Kim, Key-Sun Choi. 212-219 [doi]
- Developing a Corpus of Indirect Speech Act SchemasAntonio Roque, Alexander Tsuetaki, Vasanth Sarathy, Matthias Scheutz. 220-228 [doi]
- Quality Estimation for Partially Subjective Classification Tasks via CrowdsourcingYoshinao Sato, Kouki Miyazawa. 229-235 [doi]
- Crowdsourcing in the Development of a Multilingual FrameNet: A Case Study of Korean FrameNetYoungGyun Hahm, Youngbin Noh, Jiyoon Han, Tae-Hwan Oh, Hyonsu Choe, Hansaem Kim, Key-Sun Choi. 236-244 [doi]
- Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text SummarizationNeslihan Iskender, Tim Polzehl, Sebastian Möller 0001. 245-253 [doi]
- A Seed Corpus of Hindu Temples in IndiaPriya Radhakrishnan. 254-258 [doi]
- Do You Believe It Happened? Assessing Chinese Readers' Veridicality JudgmentsYu-Yun Chang, Shu-Kai Hsieh. 259-267 [doi]
- Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language LearningLionel Nicolas, Verena Lyding, Claudia Borg, Corina Forascu, Karën Fort, Katerina Zdravkova, Iztok Kosem, Jaka Cibej, Spela Arhar Holdt, Alice Millour, Alexander König, Christos T. Rodosthenous, Federico Sangati, Umair ul Hassan, Anisia Katinskaia, Anabela Barreiro, Lavinia Aparaschivei, Yaakov HaCohen-Kerner. 268-278 [doi]
- MAGPIE: A Large Corpus of Potentially Idiomatic ExpressionsHessel Haagsma, Johan Bos, Malvina Nissim. 279-287 [doi]
- CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz DialoguesFrancisco Javier Chiyah Garcia, José Lopes, Xingkun Liu, Helen F. Hastie. 288-297 [doi]
- Effort Estimation in Named Entity Tagging TasksInês Gomes, Rui Correia, Jorge Ribeiro 0004, João Freitas. 298-306 [doi]
- Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNetChristos T. Rodosthenous, Verena Lyding, Federico Sangati, Alexander König, Umair ul Hassan, Lionel Nicolas, Jolita Horbacauskiene, Anisia Katinskaia, Lavinia Aparaschivei. 307-316 [doi]
- Predicting Multidimensional Subjective Ratings of Children' Readings from the Speech Signals for the Automatic Assessment of FluencyGérard Bailly, Erika Godde, Anne-Laure Piat-Marchand, Marie-Line Bosse. 317-322 [doi]
- Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine LanguagesElham Akhlaghi, Branislav Bédi, Fatih Bektas, Harald Berthelsen, Matthias Butterweck, Cathy Chua, Catia Cucchiarini, Gülsen Eryigit, Johanna Gerlach, Hanieh Habibi, Neasa Ní Chiaráin, Manny Rayner, Steinþór Steingrímsson, Helmer Strik. 323-331 [doi]
- A Dataset for Investigating the Impact of Feedback on Student Revision OutcomeIldikó Pilán, John Lee 0001, Chak Yan Yeung, Jonathan Webster. 332-339 [doi]
- Creating Corpora for Research in Feedback Comment GenerationRyo Nagata, Kentaro Inui, Shin'ichiro Ishikawa. 340-345 [doi]
- Using Multilingual Resources to Evaluate CEFRLex for Learner ApplicationsJohannes Graën, David Alfter, Gerold Schneider. 346-355 [doi]
- Immersive Language Exploration with Object Recognition and Augmented RealityBenny Platte, anett Platte, Christian Roschke, Rico Thomanek, Tony Rolletschke, Frank Zimmer, Marc Ritter. 356-362 [doi]
- A Process-oriented Dataset of Revisions during WritingRianne Conijn, Emily Dux Speltz, Menno van Zaanen, Luuk Van Waes, Evgeny Chukharev-Hudilainen. 363-368 [doi]
- Automated Writing Support Using Deep Linguistic ParsersLuís Morgado da Costa, Roger Vivek Placidus Winder, Shu Yun Li, Benedict Christopher Tzer Liang Lin, Joseph MacKinnon, Francis Bond. 369-377 [doi]
- TLT-school: a Corpus of Non Native Children SpeechRoberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna. 378-385 [doi]
- Toward a Paradigm Shift in Collection of Learner CorporaAnisia Katinskaia, Sardana Ivanova, Roman Yangarber. 386-391 [doi]
- Quality Focused Approach to a Learner Corpus DevelopmentRoberts Dargis, Ilze Auzina, Kristine Levane-Petrova, Inga Kaija. 392-396 [doi]
- An Exploratory Study into Automated Précis GradingOrphée De Clercq, Senne Van Hoecke. 397-404 [doi]
- Adjusting Image Attributes of Localized Regions with Low-level DialogueTzu-Hsiang Lin, Alexander I. Rudnicky, Trung Bui, Doo Soon Kim, Jean Oh. 405-412 [doi]
- Alignment Annotation for Clinic Visit Dialogue to Clinical Note Sentence Language GenerationWen-wai Yim, Meliha Yetisgen, Jenny Huang, Micah Grossman. 413-421 [doi]
- MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking BaselinesMihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Kumar Goyal, Peter Ku, Dilek Hakkani-Tür. 422-428 [doi]
- A Comparison of Explicit and Implicit Proactive Dialogue Strategies for Conversational RecommendationMatthias Kraus, Fabian Fischbach, Pascal Jansen, Wolfgang Minker. 429-435 [doi]
- Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for BasqueArantxa Otegi, Aitor Gonzalez-Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre. 436-442 [doi]
- Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal ClosenessYoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito. 443-448 [doi]
- BLISS: An Agent for Collecting Spoken Dialogue Data about Health and Well-beingJelte van Waterschoot, Iris Hendrickx, Arif Khan, Esther Klabbers, Marcel de Korte, Helmer Strik, Catia Cucchiarini, Mariët Theune. 449-458 [doi]
- The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer ServiceMeng Chen, Ruixue Liu, Lei Shen, Shaozu Yuan, Jingyan Zhou, Youzheng Wu, Xiaodong He, Bowen Zhou. 459-466 [doi]
- "Cheese!": a Corpus of Face-to-face French Interactions. A Case Study for Analyzing Smiling and Conversational HumorBéatrice Priego-Valverde, Brigitte Bigi, Mary Amoyal. 467-475 [doi]
- The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue SystemsAlberto Chierici, Nizar Habash, Margarita Bicec. 476-484 [doi]
- How Users React to Proactive Voice Assistant Behavior While DrivingMaria Schmidt, Wolfgang Minker, Steffen Werner. 485-490 [doi]
- Emotional Speech Corpus for Persuasive Dialogue SystemSara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti, Satoshi Nakamura. 491-497 [doi]
- Multimodal Analysis of Cohesion in Multi-party InteractionsReshmashree Bangalore Kantharaju, Caroline Langlet, Mukesh Barange, Chloé Clavel, Catherine Pelachaud. 498-507 [doi]
- Treating Dialogue Quality Evaluation as an Anomaly Detection ProblemRostislav Nedelchev, Ricardo Usbeck, Jens Lehmann 0001. 508-512 [doi]
- Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue SystemsNiklas Rach, Yuki Matsuda 0001, Johannes Daxenberger, Stefan Ultes, Keiichi Yasumoto, Wolfgang Minker. 513-522 [doi]
- PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant DomainAlessandra Zarcone, Touhidul Alam, Zahra Kolagar. 523-530 [doi]
- Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative FunctionsEugénio Ribeiro, Ricardo Ribeiro 0001, David Martins de Matos. 531-539 [doi]
- Estimating User Communication Styles for Spoken Dialogue SystemsJuliana Miehle, Isabel Feustel, Julia Hornauer, Wolfgang Minker, Stefan Ultes. 540-548 [doi]
- The ISO Standard for Dialogue Act Annotation, Second EditionHarry Bunt, Volha Petukhova, Emer Gilmartin, Catherine Pelachaud, Alex Chengyu Fang, Simon Keizer, Laurent Prévot 0001. 549-558 [doi]
- The AICO Multimodal Corpus - Data Collection and Preliminary AnalysesKristiina Jokinen. 559-564 [doi]
- A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation ModelsFabian Galetzka, Chukwuemeka Uchenna Eneh, David Schlangen. 565-573 [doi]
- A French Medical Conversations Corpus Annotated for a Virtual Patient Dialogue SystemFréjus A. A. Laleye, Gaël de Chalendar, Antonia Blanié, Antoine Brouquet, Dan Behnamou. 574-580 [doi]
- Getting To Know You: User Attribute Extraction from DialoguesChien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu, Pascale Fung. 581-589 [doi]
- Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory VisualizationAbhinav Kumar 0002, Barbara Di Eugenio, Jillian Aurisano, Andrew E. Johnson. 590-599 [doi]
- RDG-Map: A Multimodal Corpus of Pedagogical Human-Agent Spoken InteractionsMaike Paetzel, Deepthi Karkada, Ramesh R. Manuvinakurike. 600-609 [doi]
- MPDD: A Multi-Party Dialogue Dataset for Analysis of Emotions and Interpersonal RelationshipsYi-Ting Chen, Hen-Hsen Huang, Hsin-Hsi Chen. 610-614 [doi]
- "Alexa in the wild" - Collecting Unconstrained Conversations with a Modern Voice Assistant in a Public EnvironmentIngo Siegert. 615-619 [doi]
- EDA: Enriching Emotional Dialogue Acts using an Ensemble of Neural AnnotatorsChandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter. 620-627 [doi]
- PACO: a Corpus to Analyze the Impact of Common Ground in Spontaneous Face-to-Face InteractionMary Amoyal, Béatrice Priego-Valverde, Stéphane Rauzy. 628-633 [doi]
- Dialogue Act Annotation in a Multimodal Corpus of First Encounter DialoguesCostanza Navarretta, Patrizia Paggio. 634-643 [doi]
- A Conversation-Analytic Annotation of Turn-Taking Behavior in Japanese Multi-Party Conversation and its Preliminary AnalysisMika Enomoto, Yasuharu Den, Yuichi Ishimoto. 644-652 [doi]
- Understanding User Utterances in a Dialog System for CaregivingYoshihiko Asao, Julien Kloetzer, Junta Mizuno, Dai Saiki, Kazuma Kadowaki, Kentaro Torisawa. 653-661 [doi]
- Designing Multilingual Interactive Agents using Small Dialogue CorporaDonghui Lin, Masayuki Otani, Ryosuke Okuno, Toru Ishida 0001. 662-667 [doi]
- Multimodal Corpus of Bidirectional Conversation of Human-human and Human-robot Interaction during fMRI ScanningBirgit Rauchbauer, Youssef Hmamouche, Brigitte Bigi, Laurent Prévot 0001, Magalie Ochs, Thierry Chaminade. 668-675 [doi]
- The Brain-IHM Dataset: a New Resource for Studying the Brain Basis of Human-Human and Human-Machine ConversationsMagalie Ochs, Roxane Bertrand, Aurélie Goujon, Deirdre Bolger, Anne Sophie Dubarry, Philippe Blache. 676-683 [doi]
- Dialogue-AMR: Abstract Meaning Representation for DialogueClaire Bonial, Lucia Donatelli, Mitchell Abrams, Stephanie M. Lukin, Stephen Tratz, Matthew Marge, Ron Artstein, David R. Traum, Clare R. Voss. 684-695 [doi]
- Relation between Degree of Empathy for Narrative Speech and Type of Responsive Utterance in Attentive ListeningKoichiro Ito, Masaki Murata, Tomohiro Ohno, Shigeki Matsubara. 696-701 [doi]
- Intent Recognition in Doctor-Patient InterviewsRobin Rojowiec, Benjamin Roth, Maximilian Fink. 702-709 [doi]
- BrainPredict: a Tool for Predicting and Visualising Local Brain ActivityYoussef Hmamouche, Laurent Prévot 0001, Magalie Ochs, Thierry Chaminade. 710-716 [doi]
- MTSI-BERT: A Session-aware Knowledge-based Conversational AgentMatteo Antonio Senese, Giuseppe Rizzo 0002, Mauro Dragoni, Maurizio Morisio. 717-725 [doi]
- Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue ObserversKallirroi Georgila, Carla Gordon, Volodymyr Yanov, David R. Traum. 726-734 [doi]
- Which Model Should We Use for a Real-World Conversational Dialogue System? a Cross-Language Relevance Model or a Deep Neural Net?Seyed Hossein Alavi, Anton Leuski, David R. Traum. 735-742 [doi]
- Chinese Whispers: A Multimodal Dataset for Embodied Language GroundingDimosthenis Kontogiorgos, Elena Sibirtseva, Joakim Gustafson. 743-749 [doi]
- AMUSED: A Multi-Stream Vector Representation Method for Use in Natural DialogueGaurav Kumar, Rishabh Joshi, Jaspreet Singh, Promod Yenigalla. 750-758 [doi]
- An Annotation Approach for Social and Referential Gaze in DialogueVidya Somashekarappa, Christine Howes, Asad Sayeed. 759-765 [doi]
- A Penn-style Treebank of Middle Low GermanHannah Booth, Anne Breitbarth, Aaron Ecay, Melissa Farasyn. 766-775 [doi]
- Books of Hours. the First Liturgical Data Set for Text SegmentationAmir Hazem, Béatrice Daille, Christopher Kermorvant, Dominique Stutzmann, Marie-Laurence Bonhomme, Martin Maarand, Mélodie Boillet. 776-784 [doi]
- Corpus of Chinese Dynastic Histories: Gender Analysis over Two MillenniaSergey Zinin, Yang Xu. 785-793 [doi]
- The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic StudyStefan Fischer 0008, Jörg Knappen, Katrin Menzel, Elke Teich. 794-802 [doi]
- Corpus REDEWIEDERGABEAnnelen Brunner, Stefan Engelberg, Fotis Jannidis, Ngoc Duyen Tanja Tu, Lukas Weimer. 803-812 [doi]
- WeDH - a Friendly Tool for Building Literary Corpora Enriched with Encyclopedic MetadataMattia Egloff, Davide Picca. 813-816 [doi]
- Automatic Section Recognition in ObituariesValentino Sabbatino, Laura Ana Maria Bostan, Roman Klinger. 817-825 [doi]
- SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary FictionSara Stymne, Carin Östman. 826-834 [doi]
- RiQuA: A Corpus of Rich Quotation Annotation for English Literary TextSean Papay, Sebastian Padó. 835-841 [doi]
- A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus"Roman Schneider. 842-848 [doi]
- The BDCamões Collection of Portuguese Literary Documents: a Research Resource for Digital Humanities and Language TechnologySara Grilo, Márcia Bolrinha, João Silva 0004, Rui Vaz, António Branco. 849-854 [doi]
- Dataset for Temporal Analysis of English-French CognatesEsteban Frossard, Mickaël Coustaty, Antoine Doucet, Adam Jatowt, Simon Hengchen. 855-859 [doi]
- Material Philology Meets Digital Onomastic Lexicography: The NordiCon Database of Medieval Nordic Personal Names in Continental SourcesMichelle Waldispühl, Dana Dannélls, Lars Borin. 860-867 [doi]
- NLP Scholar: A Dataset for Examining the State of NLP ResearchSaif M. Mohammad. 868-877 [doi]
- The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World's LanguagesShafqat Mumtaz Virk, Harald Hammarström, Markus Forsberg, Søren Wichmann. 878-884 [doi]
- LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic ManuscriptsKlaus Müller, Aleksej Tikhonov, Roland Meyer. 885-890 [doi]
- TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of TextsGiuseppe Abrami, Manuel Stoeckel, Alexander Mehler. 891-900 [doi]
- Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word EmbeddingsBikash Gyawali, Lucas Anastasiou, Petr Knoth. 901-910 [doi]
- "Voices of the Great War": A Richly Annotated Corpus of Italian Texts on the First World WarFederico Boschetti, Irene De Felice, Stefano Dei Rossi, Felice dell'Orletta, Michele Di Giorgio, Martina Miliani, Lucia C. Passaro, Angelica Puddu, Giulia Venturi, Nicola Labanca, Alessandro Lenci, Simonetta Montemagni. 911-918 [doi]
- DEbateNet-mig15: Tracing the 2015 Immigration Debate in Germany Over TimeGabriella Lapesa, André Blessing, Nico Blokker, Erenay Dayanik, Sebastian Haunss, Jonas Kuhn, Sebastian Padó. 919-927 [doi]
- A Corpus of Spanish Political Speeches from 1937 to 2019Elena Alvarez-Mellado. 928-932 [doi]
- A New Latin Treebank for Universal Dependencies: Charters between Ancient Latin and Romance LanguagesFlavio Massimiliano Cecchini, Timo Korkiakangas, Marco Passarotti. 933-942 [doi]
- Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word EmbeddingsRenato Rocha Souza, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt. 943-947 [doi]
- A Multi-Orthography Parallel Corpus of Yiddish NounsJonne Saleva. 948-952 [doi]
- An Annotated Corpus of Adjective-Adverb Interfaces in Romance LanguagesKatharina Gerhalter, Gerlinde Schneider, Christopher Pollin, Martin Hummel. 953-957 [doi]
- Language Resources for Historical Newspapers: the Impresso CollectionMaud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Ströbel, Raphaël Barman. 958-968 [doi]
- Allgemeine Musikalische Zeitung as a Searchable Online CorpusBernd Kampe, Tinghui Duan, Udo Hahn. 969-976 [doi]
- Stylometry in a Bilingual SetupSilvie Cinková, Jan Rybicki. 977-984 [doi]
- Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and DialectYo Sato, Kevin Heffernan. 985-990 [doi]
- DiscSense: Automated Semantic Analysis of Discourse MarkersDamien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller. 991-999 [doi]
- ThemePro: A Toolkit for the Analysis of Thematic ProgressionMónica Domínguez, Juan Soler Company, Leo Wanner. 1000-1007 [doi]
- Machine-Aided Annotation for Fine-Grained Proposition Types in ArgumentationYohan Jo, Elijah Mayfield, Chris Reed, Eduard H. Hovy. 1008-1018 [doi]
- Chinese Discourse Parsing: Model and EvaluationLin Chuan-An, Shyh-Shiun Hung, Hen-Hsen Huang, Hsin-Hsi Chen. 1019-1024 [doi]
- Shallow Discourse Annotation for Chinese TED TalksWanqiu Long, Xinyi Cai, James E. M. Reid, Bonnie Webber, Deyi Xiong. 1025-1032 [doi]
- The Discussion Tracker Corpus of Collaborative ArgumentationChristopher Olshefski, Luca Lugini, Ravneet Singh, Diane J. Litman, Amanda Godley. 1033-1043 [doi]
- Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation ProjectionHenny Sluyter-Gäthje, Peter Bourgonje, Manfred Stede. 1044-1050 [doi]
- A Corpus of Encyclopedia Articles with Logical FormsNathan Rasmussen, William Schuler. 1051-1060 [doi]
- The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse ParsingPeter Bourgonje, Manfred Stede. 1061-1066 [doi]
- On the Creation of a Corpus for Coherence Evaluation of Discursive UnitsElham Mohammadi, Timothe Beiko, Leila Kosseim. 1067-1072 [doi]
- Joint Learning of Syntactic Features Helps Discourse SegmentationTakshak Desai, Parag Dakle, Dan I. Moldovan. 1073-1080 [doi]
- Creating a Corpus of Gestures and Predicting the Audience Response based on Gestures in Speeches of Donald TrumpVerena Ruf, Costanza Navarretta. 1081-1088 [doi]
- GeCzLex: Lexicon of Czech and German Anaphoric ConnectivesLucie Poláková, Katerina Rysova, Magdaléna Rysová, Jirí Mírovský. 1089-1096 [doi]
- DiMLex-Bangla: A Lexicon of Bangla Discourse ConnectivesDebopam Das, Manfred Stede, Soumya Sankar Ghosh, Lahari Chatterjee. 1097-1102 [doi]
- Semi-Supervised Tri-Training for Explicit Discourse Argument ExpansionRené Knaebel, Manfred Stede. 1103-1109 [doi]
- WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long TextsDhivya Chinnappa, Alexis Palmer, Eduardo Blanco 0002. 1110-1117 [doi]
- TED-Q: TED Talks and the Questions they EvokeMatthijs Westera, Laia Mayol, Hannah Rohde. 1118-1127 [doi]
- CzeDLex 0.6 and its Representation in the PML-TQJirí Mírovský, Lucie Poláková, Pavlína Synková. 1128-1134 [doi]
- Corpus for Modeling User Interactions in Online Persuasive DiscussionsRyo Egawa, Gaku Morio, Katsuhide Fujita. 1135-1141 [doi]
- Simplifying Coreference Chains for Dyslexic ChildrenRodrigo Wilkens, Amalia Todirascu. 1142-1151 [doi]
- Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse ConnectivesYudai Kishimoto, Yugo Murawaki, Sadao Kurohashi. 1152-1158 [doi]
- What Speakers really Mean when they Ask Questions: Classification of Intentions with a Supervised ApproachAngèle Barbedette, Iris Eshkol-Taravella. 1159-1166 [doi]
- Modeling Dialogue in Conversational Cognitive Health Screening InterviewsShahla Farzana, Mina Valizadeh, Natalie Parde. 1167-1177 [doi]
- Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social MediaNadiya Straton, Hyeju Jang, Raymond T. Ng. 1178-1190 [doi]
- An Annotated Dataset of Discourse Modes in Hindi StoriesSwapnil Dhanwal, Hritwik Dutta, Hitesh Nankani, Nilay Shrivastava, Yaman Kumar, Junyi Jessy Li, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah, Amanda Stent. 1191-1196 [doi]
- Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag SetHassan S. Shavarani, Satoshi Sekine. 1197-1201 [doi]
- An Algerian Corpus and an Annotation Platform for Opinion and Emotion AnalysisLeila Moudjari, Karima Akli-Astouati, Farah Benamara. 1202-1210 [doi]
- Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) TaskValeriya Slovikovskaya, Giuseppe Attardi. 1211-1218 [doi]
- Scientific Statement Classification over arXiv.orgDeyan Ginev, Bruce R. Miller. 1219-1226 [doi]
- Cross-domain Author Gender Classification in Brazilian PortugueseRafael Dias, Ivandré Paraboni. 1227-1234 [doi]
- LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in ContractsDon Tuggener, Pius von Däniken, Thomas Peetz, Mark Cieliebak. 1235-1241 [doi]
- Online Near-Duplicate Detection of News ArticlesSimon Rodier, Dave Carter. 1242-1249 [doi]
- Automated Essay Scoring System for Nonnative Japanese LearnersReo Hirao, Mio Arai, Hiroki Shimanaka, Satoru Katsumata, Mamoru Komachi. 1250-1257 [doi]
- A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto TrialJan Neerbek, Morten Eskildsen, Peter Dolog, Ira Assent. 1258-1267 [doi]
- Discovering Biased News Articles Leveraging Multiple Human AnnotationsKonstantina Lazaridou, Alexander Löser, Maria Mestre, Felix Naumann. 1268-1277 [doi]
- Corpora and Baselines for Humour Recognition in PortugueseHugo Gonçalo Oliveira, André Clemêncio, Ana Alves 0001. 1278-1285 [doi]
- FactCorp: A Corpus of Dutch Fact-checks and its Multiple UsagesMarten van der Meulen, W. Gudrun Reijnierse. 1286-1292 [doi]
- Automatic Orality Identification in Historical TextsKatrin Ortmann, Stefanie Dipper. 1293-1302 [doi]
- Using Deep Neural Networks with Intra- and Inter-Sentence Context to Classify Suicidal BehaviourXingyi Song, Johnny Downs, Sumithra Velupillai, Rachel Holden, Maxim Kikoler, Kalina Bontcheva, Rina Dutta, Angus Roberts. 1303-1310 [doi]
- A First Dataset for Film Age Appropriateness InvestigationEmad Mohamed, Le An Ha. 1311-1317 [doi]
- Habibi - a multi Dialect multi National Arabic Song Lyrics CorpusMahmoud El-Haj. 1318-1326 [doi]
- Age Suitability Rating: Predicting the MPAA Rating Based on Movie DialoguesMahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar, Thamar Solorio. 1327-1335 [doi]
- Email Classification Incorporating Social Networks and Thread StructureSakhar B. Alkhereyfy, Owen Rambow. 1336-1345 [doi]
- Development and Validation of a Corpus for Machine Humor ComprehensionYuen-Hsien Tseng, Wun-Syuan Wu, Chia-Yueh Chang, Hsueh-Chih Chen, Wei-Lun Hsu. 1346-1352 [doi]
- Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic ReadersNúria Gala, Anaïs Tack, Ludivine Javourey-Drevet, Thomas François, Johannes C. Ziegler. 1353-1361 [doi]
- A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted PatientsEdward T. Moseley, Joy T. Wu, Jonathan Welt, John Foote Jr., Patrick D. Tyler, David W. Grant, Eric T. Carlson, Sebastian Gehrmann, Franck Dernoncourt, Leo Anthony Celi. 1362-1367 [doi]
- Multilingual Stance Detection in Tweets: The Catalonia Independence CorpusElena Zotova, Rodrigo Agerri, Manuel Núñez 0005, German Rigau. 1368-1375 [doi]
- An Evaluation of Progressive Neural Networksfor Transfer Learning in Natural Language ProcessingAbdul Moeed, Gerhard Hagerer, Sumit Dugar, Sarthak Gupta, Mainak Ghosh, Hannah Danner, Oliver Mitevski, Andreas Nawroth, Georg Groh. 1376-1381 [doi]
- WAC: A Corpus of Wikipedia Conversations for Online Abuse DetectionNoé Cecillon, Vincent Labatut, Richard Dufour, Georges Linarès. 1382-1390 [doi]
- FloDusTA: Saudi Tweets Dataset for Flood, Dust Storm, and Traffic Accident EventsBtool Hamoui, Mourad Mars, Khaled Almotairi. 1391-1396 [doi]
- An Annotated Corpus for Sexism Detection in French TweetsPatricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi, Marlène Coulomb-Gully. 1397-1403 [doi]
- Measuring the Impact of Readability Features in Fake News DetectionRoney L. S. Santos, Gabriela Wick-Pedro, Sidney Leal, Oto A. Vale, Thiago A. S. Pardo, Kalina Bontcheva, Carolina Scarton. 1404-1413 [doi]
- When Shallow is Good Enough: Automatic Assessment of Conceptual Text Complexity using Shallow Semantic FeaturesSanja Stajner, Ioana Hulpus. 1414-1422 [doi]
- DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed TextPasquale Capuozzo, Ivano Lauriola, Carlo Strapparava, Fabio Aiolli, Giuseppe Sartori. 1423-1430 [doi]
- Age Recommendation for TextsAlexis Blandin, Gwénolé Lecorvé, Delphine Battistelli, Aline Étienne. 1431-1439 [doi]
- Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech RecognitionXiaolei Huang, Linzi Xing, Franck Dernoncourt, Michael J. Paul. 1440-1448 [doi]
- VICTOR: a Dataset for Brazilian Legal Documents ClassificationPedro Henrique Luz de Araujo, Teófilo Emídio de Campos, Fabricio Ataides Braz, Nilton Correia da Silva. 1449-1458 [doi]
- Dynamic Classification in Web Archiving CollectionsKrutarth Patel, Cornelia Caragea, Mark Phillips. 1459-1468 [doi]
- Aspect Flow Representation and Audio Inspired Analysis for TextsLarissa Vasconcelos, Cláudio E. C. Campelo, Caio Libânio Melo Jerônimo. 1469-1477 [doi]
- Annotating and Analyzing Biased Sentences in News Articles using CrowdsourcingSora Lim, Adam Jatowt, Michael Färber 0001, Masatoshi Yoshikawa. 1478-1484 [doi]
- Evaluation of Deep Gaussian Processes for Text ClassificationP. Jayashree, P. K. Srijith. 1485-1491 [doi]
- EmoEvent: A Multilingual Emotion Corpus based on different EventsFlor Miriam Plaza del Arco, Carlo Strapparava, Luis Alfonso Ureña López, María-Teresa Martín Valdivia. 1492-1498 [doi]
- MuSE: a Multimodal Dataset of Stressed EmotionMimansa Jaiswal, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost. 1499-1510 [doi]
- Affect inTweets: A Transfer Learning ApproachLinrui Zhang, Hsin-Lun Huang, Yang Yu, Dan Moldovan. 1511-1516 [doi]
- Annotation of Emotion Carriers in Personal NarrativesAniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi. 1517-1525 [doi]
- Towards Interactive Annotation for Hesitation in Conversational SpeechJane Wottawa, Marie Tahon, Apolline Marin, Nicolas Audibert. 1526-1532 [doi]
- Abusive language in Spanish children and young teenager's conversations: data preparation and short text classification with contextual word embeddingsMarta R. Costa-Jussà, Esther González, Asunción Moreno, Eudald Cumalat. 1533-1537 [doi]
- IIIT-H TEMD Semi-Natural Emotional Speech Database from Professional Actors and Non-ActorsBanothu Rambabu, Kishore Kumar Botsa, P. Gangamohan, Suryakanth V. Gangashetty. 1538-1545 [doi]
- The POTUS Corpus, a Database of Weekly Addresses for the Study of Stance in Politics and Virtual AgentsThomas Janssoone, Kévin Bailly, Gaël Richard, Chloé Clavel. 1546-1553 [doi]
- GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader PerceptionLaura Ana Maria Bostan, Evgeny Kim, Roman Klinger. 1554-1566 [doi]
- SOLO: A Corpus of Tweets for Examining the State of Being AloneSvetlana Kiritchenko, Will E. Hipson, Robert Coplan, Saif M. Mohammad. 1567-1577 [doi]
- PoKi: A Large Dataset of Poems by ChildrenWill E. Hipson, Saif M. Mohammad. 1578-1589 [doi]
- AlloSat: A New Call Center French Corpus for Satisfaction and Frustration AnalysisManon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau. 1590-1597 [doi]
- Learning the Human Judgment for the Automatic Evaluation of ChatbotShih-Hung Wu, Sheng-Lun Chien. 1598-1602 [doi]
- Korean-Specific Emotion Annotation Procedure Using N-Gram-Based Distant Supervision and Korean-Specific-Feature-Based Distant SupervisionYoung-Jun Lee, Chae-Gyun Lim, Ho-Jin Choi. 1603-1610 [doi]
- Semi-Automatic Construction and Refinement of an Annotated Corpus for a Deep Learning Framework for Emotion ClassificationJiajun Xu, Kyosuke Masuda, Hiromitsu Nishizaki, Fumiyo Fukumoto, Yoshimi Suzuki. 1611-1617 [doi]
- CEASE, a Corpus of Emotion Annotated Suicide notes in EnglishSoumitra Ghosh, Asif Ekbal, Pushpak Bhattacharyya. 1618-1626 [doi]
- Training a Broad-Coverage German Sentiment Classification Model for Dialog SystemsOliver Guhr, Anne-Kathrin Schumann, Frank Bahrmann, Hans-Joachim Böhme. 1627-1632 [doi]
- An Event-comment Social Media Corpus for Implicit Emotion AnalysisSophia Yat Mei Lee, Helena Yan Ping Lau. 1633-1642 [doi]
- An Emotional Mess! Deciding on a Framework for Building a Dutch Emotion-Annotated CorpusLuna De Bruyne, Orphée De Clercq, Véronique Hoste. 1643-1651 [doi]
- PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English PoetryThomas Haider, Steffen Eger, Evgeny Kim, Roman Klinger, Winfried Menninghaus. 1652-1663 [doi]
- Learning Word Ratings for Empathy and Distress from Document-Level User ResponsesJoão Sedoc, Sven Buechel, Yehonathan Nachmany, Anneke Buffone, Lyle Ungar. 1664-1673 [doi]
- Evaluation of Sentence Representations in PolishSlawomir Dadas, Michal Perelkiewicz, Rafal Poswiata. 1674-1680 [doi]
- Identification of Primary and Collateral Tracks in Stuttered SpeechRachid Riad, Anne-Catherine Bachoud-Lévi, Frank Rudzicz, Emmanuel Dupoux. 1681-1688 [doi]
- How to Compare Automatically Two Phonological Strings: Application to Intelligibility Measurement in the Case of Atypical SpeechAlain Ghio, Muriel Lalain, Laurence Giusti, Corinne Fredouille, Virginie Woisard. 1689-1694 [doi]
- Evaluating Text Coherence at Sentence and Paragraph LevelsSennan Liu, Shuang Zeng, Sujian Li. 1695-1703 [doi]
- HardEval: Focusing on Challenging Tokens to Assess Robustness of NERGabriel Bernier-Colborne, Philippe Langlais. 1704-1711 [doi]
- An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly PapersKenichi Iwatsuki, Florian Boudin, Akiko Aizawa. 1712-1720 [doi]
- An Automatic Tool For Language EvaluationFabio Fassetti, Ilaria Fassetti. 1721-1726 [doi]
- Which Evaluations Uncover Sense Representations that Actually Make Sense?Jordan L. Boyd-Graber, Fenfei Guo, Leah Findlater, Mohit Iyyer. 1727-1738 [doi]
- Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text CollectionsYi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab. 1739-1746 [doi]
- Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network ApproachBonan Min, Yee Seng Chan, Lingjun Zhao. 1747-1752 [doi]
- Linguistic Appropriateness and Pedagogic Usefulness of Reading Comprehension QuestionsAndrea Horbach, Itziar Aldabe, Marie Bexte, Oier Lopez de Lacalle, Montse Maritxalar. 1753-1762 [doi]
- Dataset Reproducibility and IR Methods in Timeline SummarizationLeo Born, Maximilian Bacher, Katja Markert. 1763-1771 [doi]
- Database Search vs. Information Retrieval: A Novel Method for Studying Natural Language Querying of Semi-Structured DataStefanie Nadig, Martin Braschler, Kurt Stockinger. 1772-1779 [doi]
- Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural ModelsChristopher Grimsley, Elijah Mayfield, Julia R. S. Bursten. 1780-1790 [doi]
- Have a Cake and Eat it Too: Assessing Discriminating Performance of an Intelligibility Index Obtained from a Reduced Sample SizeAnna K. Marczyk, Alain Ghio, Muriel Lalain, Marie Rebourg, Corinne Fredouille, Virginie Woisard. 1791-1795 [doi]
- Evaluation Metrics for Headline Generation Using Deep Pre-Trained EmbeddingsAbdul Moeed, Yang An, Gerhard Hagerer, Georg Groh. 1796-1802 [doi]
- LinCE: A Centralized Benchmark for Linguistic Code-switching EvaluationGustavo Aguilar, Sudipta Kar, Thamar Solorio. 1803-1813 [doi]
- Paraphrase Generation and Evaluation on Colloquial-Style SentencesEetu Sjöblom, Mathias Creutz, Yves Scherrer. 1814-1822 [doi]
- Analyzing Word Embedding Through Structural Equation ModelingNamgi Han, Katsuhiko Hayashi, Yusuke Miyao. 1823-1832 [doi]
- Evaluation of Lifelong Learning SystemsYevhenii Prokopalo, Sylvain Meignier, Olivier Galibert, Loïc Barrault, Anthony Larcher. 1833-1841 [doi]
- Interannotator Agreement for Lexico-Semantic Annotation of a CorpusElzbieta Hajnicz. 1842-1848 [doi]
- An In-Depth Comparison of 14 Spelling Correction Tools on a Common BenchmarkMarkus Näther. 1849-1857 [doi]
- Sentence Level Human Translation Quality Estimation with Attention-based Neural NetworksYu Yuan, Serge Sharoff. 1858-1865 [doi]
- Evaluating Language Tools for Fifteen EU-official Under-resourced LanguagesDiego Alves, Gaurish Thakkar, Marko Tadic. 1866-1873 [doi]
- Word Embedding Evaluation for SinhalaDimuthu Lakmal, Surangika Ranathunga, Saman Peramuna, Indu Herath. 1874-1881 [doi]
- Stress Test Evaluation of Transformer-based Models in Natural Language Understanding TasksCarlos Aspillaga, Andrés Carvallo, Vladimir Araujo. 1882-1894 [doi]
- Brand-Product Relation Extraction Using Heterogeneous Vector Space RepresentationsArkadiusz Janz, Lukasz Kopoci'nski, Maciej Piasecki, Agnieszka Pluwak. 1895-1901 [doi]
- A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation ParsingMaja Buljan, Joakim Nivre, Stephan Oepen, Lilja Øvrelid. 1902-1909 [doi]
- Headword-Oriented Entity Linking: A Special Entity Linking Task with Dataset and BaselineMu Yang, Chi-Yen Chen, Yi-Hui Lee, Qian-hui Zeng, Wei-Yun Ma, Chen-Yang Shih, Wei-Jhih Chen. 1910-1917 [doi]
- TableBank: Table Benchmark for Image-based Table Detection and RecognitionMinghao Li, Lei Cui 0001, Shaohan Huang, Furu Wei, Ming Zhou 0001, Zhoujun Li. 1918-1925 [doi]
- WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval DatasetJibril Frej, Didier Schwab, Jean-Pierre Chevallet. 1926-1933 [doi]
- Constructing a Public Meeting CorpusKoji Tanaka, Chenhui Chu, Haolin Ren, Benjamin Renoust, Yuta Nakashima, Noriko Takemura, Hajime Nagahara, Takao Fujikawa. 1934-1940 [doi]
- Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific LiteratureFusataka Kuniyoshi, Kohei Makino, Jun Ozawa, Makoto Miwa. 1941-1950 [doi]
- WEXEA: Wikipedia EXhaustive Entity AnnotationMichael Strobl, Amine Trabelsi, Osmar R. Zaïane. 1951-1958 [doi]
- Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological InformationArnaud Ferré, Robert Bossy, Mouhamadou Ba, Louise Deléger, Thomas Lavergne, Pierre Zweigenbaum, Claire Nédellec. 1959-1966 [doi]
- HBCP Corpus: A New Resource for the Analysis of Behavioural Change Intervention ReportsFrancesca Bonin, Martin Gleize, Ailbhe Finnerty, Candice Moore, Charles Jochim, Emma Norris, Yufang Hou, Alison J. Wright, Debasis Ganguly, Emily Hayes, Silje Zink, Alessandra Pascale, Pol Mac Aonghusa, Susan Michie. 1967-1975 [doi]
- Cross-lingual Structure Transfer for Zero-resource Event ExtractionDi Lu, Ananya Subburathinam, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare R. Voss. 1976-1981 [doi]
- Cross-Domain Evaluation of Edge Detection for Biomedical Event ExtractionAlan Ramponi, Barbara Plank, Rosario Lombardo. 1982-1989 [doi]
- Semantic Annotation for Improved Safety in Construction WorkPaul Thompson, Tim Yates, Emrah Inan, Sophia Ananiadou. 1990-1999 [doi]
- Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual SourcesLeonidas Tsekouras, Georgios Petasis, George Giannakopoulos, Aris Kosmopoulos. 2000-2008 [doi]
- Development of a Corpus Annotated with Medications and their Attributes in Psychiatric Health RecordsJaya Chaturvedi, Natalia Viani, Jyoti Sanyal, Chloe Tytherleigh, Idil Hasan, Kate Baird, Sumithra Velupillai, Robert Stewart 0002, Angus Roberts. 2009-2016 [doi]
- Do not let the history haunt you: Mitigating Compounding Errors in Conversational Question AnsweringAngrosh Mandya, James O'Neill, Danushka Bollegala, Frans Coenen. 2017-2025 [doi]
- CLEEK: A Chinese Long-text Corpus for Entity LinkingWeixin Zeng, Xiang Zhao 0002, Jiuyang Tang, Zhen Tan, Xuqian Huang. 2026-2035 [doi]
- The Medical Scribe: Corpus Development and Model Performance AnalysesIzhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul. 2036-2044 [doi]
- A Contract Corpus for Recognizing Rights and ObligationsRuka Funaki, Yusuke Nagata, Kohei Suenaga, Shinsuke Mori. 2045-2053 [doi]
- Recognition of Implicit Geographic Movement in TextScott Pezanowski, Prasenjit Mitra. 2054-2063 [doi]
- Extraction of the Argument Structure of Tokyo Metropolitan Assembly Minutes: Segmentation of Question-and-Answer SetsKeiichi Takamaru, Yasutomo Kimura, Hideyuki Shibuki, Hokuto Ototake, Yuzu Uchida, Kotaro Sakamoto, Madoka Ishioroshi, Teruko Mitamura, Noriko Kando. 2064-2068 [doi]
- A Term Extraction Approach to Survey Analysis in Health CareCécile Robin, Mona Isazad Mashinchi, Fatemeh Ahmadi Zeleti, Adegboyega Ojo, Paul Buitelaar. 2069-2077 [doi]
- A Scientific Information Extraction Dataset for Nature Inspired EngineeringRuben Kruiper, Julian F. V. Vincent, Jessica Chen-Burger, Marc P. Y. Desmulliez, Ioannis Konstas. 2078-2085 [doi]
- Automated Discovery of Mathematical Definitions in TextNatalia Vanetik, Marina Litvak, Sergey Shevchuk, Lior Reznik. 2086-2094 [doi]
- WN-Salience: A Corpus of News Articles with Entity Salience AnnotationsChuan Wu, Evangelos Kanoulas, Maarten de Rijke, Wei Lu. 2095-2102 [doi]
- Event Extraction from Unstructured Amharic TextEphrem Tadesse, Rosa Tsegaye, Kuulaa Qaqqabaa. 2103-2109 [doi]
- Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian LanguageBernardo Magnini, Alberto Lavelli, Simone Magnolini. 2110-2119 [doi]
- MyFixit: An Annotated Dataset, Annotation Tool, and Baseline Methods for Information Extraction from Repair ManualsNima Nabizadeh, Dorothea Kolossa, Martin Heckmann. 2120-2128 [doi]
- Towards Entity SpacesMarieke van Erp, Paul Groth. 2129-2137 [doi]
- Love Me, Love Me, Say (and Write!) that You Love Me: Enriching the WASABI Song Corpus with Lyrics AnnotationsMichael Fell, Elena Cabrio, Elmahdi Korfed, Michel Buffa, Fabien Gandon. 2138-2147 [doi]
- Evaluating Information Loss in Temporal Dependency TreesMustafa Ocal, Mark A. Finlayson. 2148-2156 [doi]
- Populating Legal Ontologies using Semantic Role LabelingLlio Humphreys, Guido Boella, Luigi Di Caro, Livio Robaldo, Leon van der Torre, Sepideh Ghanavati, Robert Muthuri. 2157-2166 [doi]
- PST 2.0 - Corpus of Polish Spatial TextsMichal Marcinczuk, Marcin Oleksy, Jan Wieczorek. 2167-2174 [doi]
- Natural Language Premise Selection: Finding Supporting Statements for Mathematical TextDeborah Ferreira, André Freitas. 2175-2182 [doi]
- Odinson: A Fast Rule-based Information Extraction FrameworkMarco Antonio Valenzuela-Escárcega, Gus Hahn-Powell, Dane Bell. 2183-2191 [doi]
- The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic SourcesJennifer D'Souza, Anett Hoppe, Arthur Brack, Mohamad Yaser Jaradeh, Sören Auer, Ralph Ewerth. 2192-2203 [doi]
- MathAlign: Linking Formula Identifiers to their Contextual Natural Language DescriptionsMaria Alexeeva, Rebecca Sharp, Marco Antonio Valenzuela-Escárcega, Jennifer Kadowaki, Adarsh Pyarelal, Clayton T. Morrison. 2204-2212 [doi]
- Domain Adapted Distant Supervision for Pedagogically Motivated Relation ExtractionOscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar. 2213-2222 [doi]
- Temporal Histories of Epidemic Events (THEE): A Case Study in Temporal Annotation for Public HealthJingcheng Niu, Victoria Ng, Gerald Penn, Erin E. Rees. 2223-2230 [doi]
- Exploiting Citation Knowledge in Personalised Recommendation of Recent Scientific PublicationsAnita Khadka, Iván Cantador, Miriam Fernández. 2231-2240 [doi]
- A Platform for Event Extraction in HindiSovan Kumar Sahoo, Saumajit Saha, Asif Ekbal, Pushpak Bhattacharyya. 2241-2250 [doi]
- Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relations in Radiology ReportsSurabhi Datta, Morgan Ulinski, Jordan Godfrey-Stovall, Shekhar Khanpara, Roy Riascos-Castaneda, Kirk Roberts. 2251-2260 [doi]
- NLP Analytics in Finance with DoRe: A French 250M Tokens Corpus of Corporate Annual ReportsCorentin Masson, Patrick Paroubek. 2261-2267 [doi]
- The Language of Brain Signals: Natural Language Processing of Electroencephalography ReportsRamón Maldonado, Sanda M. Harabagiu. 2268-2275 [doi]
- Humans Keep It One Hundred: an Overview of AI JourneyTatiana Shavrina, Anton A. Emelyanov, Alena Fenogenova, Vadim Fomin, Vladislav Mikhailov, Andrey Evlampiev, Valentin Malykh, Vladimir Larin, Alex Natekin, Aleksandr Vatulin, Peter Romov, Daniil Anastasiev, Nikolai Zinov, Andrey Chertok. 2276-2284 [doi]
- Towards Data-driven Ontologies: a Filtering Approach using Keywords and Natural Language ConstructsMaaike de Boer, Jack P. C. Verhoosel. 2285-2292 [doi]
- A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial NewsAli Jabbari, Olivier Sauvage, Hamada Zeine, Hamza Chergui. 2293-2299 [doi]
- Inferences for Lexical Semantic Resource Building with Less SupervisionNadia Bebeshina, Mathieu Lafourcade. 2300-2305 [doi]
- Acquiring Social Knowledge about Personality and Driving-related BehaviorRitsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi. 2306-2315 [doi]
- Implicit Knowledge in Argumentative Texts: An Annotated CorpusMaria Becker, Katharina Korfhage, Anette Frank. 2316-2324 [doi]
- Multiple Knowledge GraphDB (MKGDB)Stefano Faralli 0001, Paola Velardi, Farid Yusifli. 2325-2331 [doi]
- Orchestrating NLP Services for the Legal DomainJulián Moreno Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodríguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza. 2332-2340 [doi]
- Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge GraphGeorgeta Bordea, Stefano Faralli 0001, Fleur Mougin, Paul Buitelaar, Gayo Diallo. 2341-2347 [doi]
- Subjective Evaluation of Comprehensibility in Movie InteractionsEstelle I. S. Randria, Lionel Fontan, Maxime Le Coz, Isabelle Ferrané, Julien Pinquier. 2348-2357 [doi]
- Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based StudyPilar León Araúz, Arianne Reimerink, Melania Cabezas-García. 2358-2367 [doi]
- Understanding Spatial Relations through Multiple ModalitiesSoham Dan, Hangfeng He, Dan Roth. 2368-2372 [doi]
- A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource LanguagesDwaipayan Roy, Sumit Bhatia, Prateek Jain. 2373-2380 [doi]
- Pártélet: A Hungarian Corpus of Propaganda Texts from the Hungarian Socialist EraZoltán Kmetty, Veronika Vincze, Dorottya Demszky, Orsolya Ring, Balázs Nagy, Martina Katalin Szabó. 2381-2388 [doi]
- DYWC: An Evaluation Data Set for Entity Linking Based on DBpedia, YAGO, Wikidata, and CrunchbaseKristian Noullet, Rico Mix, Michael Färber. 2389-2395 [doi]
- Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex SituationsÖzge Alaçam, Eugen Ruppert, Amr Rekaby Salama, Tobias Staron, Wolfgang Menzel. 2396-2404 [doi]
- SiBert: Enhanced Chinese Pre-trained Language Model with Sentence InsertionJiahao Chen, Chenjie Cao, Xiuyan Jiang. 2405-2412 [doi]
- Processing South Asian Languages Written in the Latin Script: the Dakshina DatasetBrian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith B. Hall. 2413-2423 [doi]
- GM-RKB WikiText Error Correction Task and BaselinesGabor Melli, Abdelrhman Eldallal, Bassim Lazem, Olga Moreira. 2424-2430 [doi]
- Embedding Space Correlation as a Measure of Domain SimilarityAnne Beyer, Göran Kauermann, Hinrich Schütze. 2431-2439 [doi]
- Wiki-40B: Multilingual Language Model DatasetMandy Guo, Zihang Dai, Denny Vrandecic, Rami Al-Rfou. 2440-2452 [doi]
- Know thy Corpus! Robust Methods for Digital Curation of Web corporaSerge Sharoff. 2453-2460 [doi]
- Evaluating Approaches to Personalizing Language ModelsMilton King, Paul Cook. 2461-2469 [doi]
- Class-based LSTM Russian Language Model with Linguistic InformationIrina S. Kipyatkova, Alexey Karpov 0001. 2470-2474 [doi]
- Adaptation of Deep Bidirectional Transformers for Afrikaans LanguageSello Ralethe. 2475-2478 [doi]
- FlauBERT: Unsupervised Language Model Pre-training for FrenchHang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. 2479-2490 [doi]
- Accelerated High-Quality Mutual-Information Based Word ClusteringManuel R. Ciosici, Ira Assent, Leon Derczynski. 2491-2496 [doi]
- Rhythmic Proximity Between Natives And Learners Of French - Evaluation of a metric based on the CEFC corpusSylvain Coulange, Solange Rossato. 2497-2502 [doi]
- From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation GapGiulia Speranza, Maria Pia di Buono, Johanna Monti, Federico Sangati. 2503-2510 [doi]
- Modeling Factual Claims with Semantic FramesFatma Arslan, Josue Caraballo, Damian Jimenez, Chengkai Li. 2511-2520 [doi]
- Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic LanguageVishwa Gupta, Gilles Boulianne. 2521-2527 [doi]
- Geographically-Balanced Gigaword Corpora for 50 Language VarietiesJonathan Dunn, Ben Adams. 2528-2536 [doi]
- Data Augmentation using Machine Translation for Fake News Detection in the Urdu LanguageMaaz Amjad, Grigori Sidorov, Alisa Zhila. 2537-2542 [doi]
- Evaluation of Greek Word EmbeddingsStamatis Outsios, Christos Karatsalos, Konstantinos Skianis, Michalis Vazirgiannis. 2543-2551 [doi]
- A Dataset of Mycenaean Linear B SequencesKaterina Papavassiliou, Gareth Owens, Dimitrios Kosmopoulos. 2552-2561 [doi]
- The Nunavut Hansard Inuktitut-English Parallel Corpus 3.0 with Preliminary Machine Translation ResultsEric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene A. Stewart, Jeffrey Micher. 2562-2572 [doi]
- Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource LanguageLeah Michel, Viktor Hangya, Alexander M. Fraser. 2573-2580 [doi]
- A Finite-State Morphological Analyser for EvenkiAnna Zueva, Anastasia Kuznetsova, Francis Tyers. 2581-2589 [doi]
- Morphology-rich Alphasyllabary EmbeddingsAmanuel Mersha, Stephen Wu 0004. 2590-2595 [doi]
- Localization of Fake News Detection via Multitask Transfer LearningJan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng. 2596-2604 [doi]
- Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in Brazilian PortugueseEdresson Casanova, Marcos V. Treviso, Lilian Hübner, Sandra M. Aluísio. 2605-2614 [doi]
- Jejueo Datasets for Machine Translation and Speech SynthesisKyubyong Park, Yo Joong Choe, Jiyeon Ham. 2615-2621 [doi]
- Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu LanguageKohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara. 2622-2628 [doi]
- Development of a Guarani - Spanish Parallel CorpusLuis Chiruzzo, Pedro J. Amarilla, Adolfo A. Rios, Gustavo Giménez Lugo. 2629-2633 [doi]
- AR-ASAG An ARabic Dataset for Automatic Short Answer Grading EvaluationLeila Ouahrani, Djamal Bennouar. 2634-2643 [doi]
- Processing Language Resources of Under-Resourced and Endangered Languages for the Generation of Augmentative Alternative Communication BoardsAnne Ferger. 2644-2648 [doi]
- The Nisvai Corpus of Oral Narrative Practices from Malekula (Vanuatu) and its Associated Language ResourcesJocelyn Aznar, Núria Gala. 2649-2656 [doi]
- Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo)Ludger Paschen, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave, Frank Seifart. 2657-2666 [doi]
- Benchmarking Neural and Statistical Machine Translation on Low-Resource African LanguagesKevin Duh, Paul McNamee, Matt Post, Brian Thompson. 2667-2675 [doi]
- Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function MorphologyEmily Chen, Hyunji Hayley Park, Lane Schwartz. 2676-2684 [doi]
- Towards a Spell Checker for Zamboanga Chavacano OrthographyMarcelo Yuji Himoro, Antonio Pareja-Lora. 2685-2697 [doi]
- Identifying Sentiments in Algerian Code-switched User-generated CommentsWafia Adouane, Samia Touileb, Jean-Philippe Bernardy. 2698-2705 [doi]
- Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss GermanLucy Linder, Michael Jungo, Jean Hennebert, Claudiu Cristian Musat, Andreas Fischer 0002. 2706-2711 [doi]
- Evaluating Sub-word Embeddings in Cross-lingual ModelsAli Hakimi Parizi, Paul Cook. 2712-2719 [doi]
- A Swiss German Dictionary: Variation in Speech and WritingLarissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardzic, Claudiu Musat. 2720-2725 [doi]
- Towards a Corsican Basic Language Resource KitLaurent Kevers, Stella Retali-Medori. 2726-2735 [doi]
- Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language ModellingJeremie Boudreau, Akankshya Patra, Ashima Suvarna, Paul Cook. 2736-2745 [doi]
- Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance LengthJacqueline Brixey, David J. Sides, Timothy Vizthum, David R. Traum, Khalil Iskarous. 2746-2753 [doi]
- Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and TwiJesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David Ifeoluwa Adelani, Cristina España-Bonet. 2754-2762 [doi]
- TRopBank: Turkish PropBank V2.0Neslihan Kara, Deniz Baran Aslan, Büsra Marsan, Özge Bakay, Koray Ak, Olcay Taner Yildiz. 2763-2772 [doi]
- Collection and Annotation of the Romanian Legal CorpusDan Tufis, Maria Mitrofan, Vasile Florian Pais, Radu Ion, Andrei Coman. 2773-2777 [doi]
- An Empirical Evaluation of Annotation Practices in Corpora from Language DocumentationKilu von Prince, Sebastian Nordhoff. 2778-2787 [doi]
- Annotated Corpus for Sentiment Analysis in Odia LanguageGaurav Mohanty, Pruthwik Mishra, Radhika Mamidi. 2788-2795 [doi]
- Building a Task-oriented Dialog System for Languages with no Training Data: the Case for BasqueMaddalen Lopez de Lacalle, Xabier Saralegi, Iñaki San Vicente. 2796-2802 [doi]
- SENCORPUS: A French-Wolof Parallel CorpusElhadji Mamadou Nguer, Alla Lo, Cheikh M. Bamba Dione, Sileye O. Ba, Moussa Lo. 2803-2811 [doi]
- A Major Wordnet for a Minority Language: Scottish GaelicGábor Bella, Fiona McNeill, Rody Gorman, Caoimhin O. Donnaile, Kirsty MacDonald, Yamini Chandrashekar, Abed Alhakim Freihat, Fausto Giunchiglia. 2812-2818 [doi]
- Crowdsourcing Speech Data for Low-Resource Languages from Low-Income WorkersBasil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyothi, Sunayana Sitaram, Vivek Seshadri. 2819-2826 [doi]
- A Resource for Studying Chatino Verbal MorphologyHilaria Cruz, Antonios Anastasopoulos, Gregory Stump. 2827-2831 [doi]
- Learnings from Technological Interventions in a Low Resource Language: A Case-Study on GondiDevansh Mehta, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Anurag Shukla, Vishnu Prasad, Venkanna U, Amit Sharma, Kalika Bali. 2832-2838 [doi]
- Irony Detection in Persian Language: A Transfer Learning Approach Using Emoji PredictionPreni Golazizian, Behnam Sabeti, Seyed Arad Ashrafi Asli, Zahra Majdabadi, Omid Momenzadeh, Reza Fahmi. 2839-2845 [doi]
- Towards Computational Resource Grammars for Runyankore and RukigaDavid Bamutura, Peter Ljunglöf, Peter Nebende. 2846-2854 [doi]
- Optimizing Annotation Effort Using Active Learning Strategies: A Sentiment Analysis Case Study in PersianSeyed Arad Ashrafi Asli, Behnam Sabeti, Zahra Majdabadi, Preni Golazizian, Reza Fahmi, Omid Momenzadeh. 2855-2861 [doi]
- BanFakeNews: A Dataset for Detecting Fake News in BanglaMd Zobaer Hossain, Md Ashraful Rahman, Md. Saiful Islam, Sudipta Kar. 2862-2871 [doi]
- A Resource for Computational Experiments on MapudungunMingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo Vega, Antonios Anastasopoulos, Lori S. Levin, Alan W. Black. 2872-2877 [doi]
- Automated Parsing of Interlinear Glossed Text from Page Images of Grammatical DescriptionsErich R. Round, Mark Ellison, Jayden L. Macklin-Cordes, Sacha Beniamine. 2878-2883 [doi]
- The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological ExplorationArya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky. 2884-2892 [doi]
- Towards Building an Automatic Transcription System for Language Documentation: Experiences from MuyuAlexander Zahrer, Andrej Zgank, Barbara Schuppler. 2893-2900 [doi]
- Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation DataDaniel Jettka, Timm Lehmberg. 2901-2905 [doi]
- CantoMap: a Hong Kong Cantonese MapTask CorpusGrégoire Winterstein, Carmen Tang, Regine Lai. 2906-2913 [doi]
- No Data to Crawl? Monolingual Corpus Creation from PDF Files of Truly low-Resource Languages in PeruGina Bustamante, Arturo Oncevay, Roberto Zariquiey. 2914-2923 [doi]
- Creating a Parallel Icelandic Dependency Treebank from Raw Text to Universal DependenciesHildur Jónsdóttir, Anton Karl Ingason. 2924-2931 [doi]
- Building a Universal Dependencies Treebank for OccitanAleksandra Miletic, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamença Poujade, Jean Sibille. 2932-2939 [doi]
- Building the Old Javanese WordnetDavid Moeljadi, Zakariya Pamuji Aminullah. 2940-2946 [doi]
- CPLM, a Parallel Corpus for Mexican Languages: Development and InterfaceGerardo Eugenio Sierra Martínez, Cynthia Montaño, Gemma Bel Enguix, Diego Córdova, Margarita Mota Montoya. 2947-2952 [doi]
- SiNER: A Large Dataset for Sindhi Named Entity RecognitionWazir Ali, Junyu Lu, Zenglin Xu. 2953-2961 [doi]
- Construct a Sense-Frame Aligned Predicate Lexicon for Chinese AMR CorpusLi Song, Yuling Dai, Yihuan Liu, Bin Li, Weiguang Qu. 2962-2969 [doi]
- MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng Han, Gareth J. F. Jones, Alan F. Smeaton. 2970-2979 [doi]
- A Myanmar (Burmese)-English Named Entity Transliteration DictionaryAye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita. 2980-2983 [doi]
- CA-EHN: Commonsense Analogy from E-HowNetPeng-Hsuan Li, Tsan-Yu Yang, Wei-Yun Ma. 2984-2990 [doi]
- Building Semantic Grams of Human KnowledgeValentina Leone, Giovanni Siragusa, Luigi Di Caro, Roberto Navigli. 2991-3000 [doi]
- Automatically Building a Multilingual Lexicon of False Friends With No SupervisionAna Sabina Uban, Liviu P. Dinu. 3001-3007 [doi]
- A Parallel WordNet for English, Swedish and BulgarianKrasimir Angelov. 3008-3015 [doi]
- ENGLAWI: From Human- to Machine-Readable WiktionaryFranck Sajous, Basilio Calderone, Nabil Hathout. 3016-3026 [doi]
- Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexiconSacha Beniamine, Martin Maiden, Erich R. Round. 3027-3035 [doi]
- word2word: A Collection of Bilingual Lexicons for 3, 564 Language PairsYo Joong Choe, Kyubyong Park, Dongwoo Kim. 3036-3045 [doi]
- Introducing Lexical Masks: a New Representation of Lexical Entries for Better Evaluation and Exchange of LexiconsBruno Cartoni, Daniel Calvelo Aros, Denny Vrandecic, Saran Lertpradit. 3046-3052 [doi]
- A Large-Scale Leveled Readability Lexicon for Standard ArabicMuhamed al Khalil, Nizar Habash, Zhengyang Jiang. 3053-3062 [doi]
- Preserving Semantic Information from Old Dictionaries: Linking Senses of the 'Altfranzösisches Wörterbuch' to WordNetAchim Stein. 3063-3068 [doi]
- Cifu: a Frequency Lexicon of Hong Kong CantoneseRegine Lai, Grégoire Winterstein. 3069-3077 [doi]
- Odi et Amo. Creating, Evaluating and Extending Sentiment Lexicons for LatinRachele Sprugnoli, Marco Passarotti, Daniela Corbetta, Andrea Peverelli. 3078-3086 [doi]
- WordWars: A Dataset to Examine the Natural Selection of WordsSaif M. Mohammad. 3087-3095 [doi]
- Challenge Dataset of Cognates and False Friend Pairs from Indian LanguagesDiptesh Kanojia, Malhar Kulkarni, Pushpak Bhattacharyya, Gholamreza Haffari. 3096-3102 [doi]
- Development of a Japanese Personality Dictionary based on Psychological MethodsRitsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi. 3103-3108 [doi]
- A Lexicon-Based Approach for Detecting Hedges in Informal TextJumayel Islam, Lu Xiao 0002, Robert E. Mercer. 3109-3113 [doi]
- Word Complexity Estimation for Japanese Lexical SimplificationDaiki Nishihara, Tomoyuki Kajiwara. 3114-3120 [doi]
- Inducing Universal Semantic Tag VectorsDa Huo, Gerard de Melo. 3121-3127 [doi]
- LexiDB: Patterns & Methods for Corpus Linguistic Database ManagementMatthew Coole, Paul Rayson, John Mariani. 3128-3135 [doi]
- Towards a Semi-Automatic Detection of Reflexive and Reciprocal Constructions and Their Representation in a Valency LexiconVáclava Kettnerová, Markéta Lopatková, Anna Vernerová, Petra Barancíková. 3136-3144 [doi]
- Languages Resources for Poorly Endowed Languages : The Case Study of Classical ArmenianChahan Vidal-Gorène, Aliénor Decours-Perez. 3145-3152 [doi]
- Constructing Web-Accessible Semantic Role Labels and Frames for Japanese as Additions to the NPCMJ Parsed CorpusKoichi Takeuchi, Alastair Butler, Iku Nagasaki, Takuya Okamura, Prashant Pardeshi. 3153-3161 [doi]
- Large-scale Cross-lingual Language Resources for Referencing and FramingPiek Vossen, Filip Ilievski, Marten Postma, Antske Fokkens, Gosse Minnema, Levi Remijnse. 3162-3171 [doi]
- Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use CaseFahad Khan, Laurent Romary, Ana Salgado, Jack Bowers, Mohamed Khemakhem, Toma Tasovac. 3172-3180 [doi]
- Linking the TUFS Basic Vocabulary to the Open Multilingual WordnetFrancis Bond, Hiroki Nomoto, Luís Morgado da Costa, Arthur Bond. 3181-3188 [doi]
- Some Issues with Building a Multilingual WordnetFrancis Bond, Luís Morgado da Costa, Michael Wayne Goodman, John Philip McCrae, Ahti Lohk. 3189-3197 [doi]
- Collocations in Russian Lexicography and Russian Collocations DatabaseMaria Khokhlova. 3198-3206 [doi]
- Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0Clémentine Fourrier, Benoît Sagot. 3207-3216 [doi]
- OFrLex: A Computational Morphological and Syntactic Lexicon for Old FrenchGaël Guibon, Benoît Sagot. 3217-3225 [doi]
- Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin WordsAlina Maria Ciobanu, Liviu P. Dinu, Laurentiu Zoicas. 3226-3231 [doi]
- A Multilingual Evaluation Dataset for Monolingual Word Sense AlignmentSina Ahmadi, John Philip McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S. Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, Thomas Troelsgård, Sussi Olsen, Simon Krek, Veronika Lipp, Tamás Váradi, László Simon, András Gyorffy, Carole Tiberius, Tanneke Schoonheim, Yifat Ben Moshe, Maya Rudich, Raya Abu Ahmad, Dorielle Lonke, Kira Kovalenko, Margit Langemets, Jelena Kallas, Oksana Dereza, Theodorus Fransen, David Cillessen, David Lindemann, Mikel Alonso, Ana Salgado, José-Luis Sancho, Rafael-J. Ureña-Ruiz, Jordi Porta Zamorano, Kiril Simov, Petya Osenova, Zara Kancheva, Ivaylo Radev, Ranka Stankovic, Andrej Perdih, Dejan Gabrovsek. 3232-3242 [doi]
- A Broad-Coverage Deep Semantic Lexicon for VerbsJames F. Allen, Hannah An, Ritwik Bose, William de Beaumont, Choh-Man Teng. 3243-3251 [doi]
- Computational Etymology and Word EmergenceWinston Wu, David Yarowsky. 3252-3259 [doi]
- A Dataset of Translational Equivalents Built on the Basis of plWordNet-Princeton WordNet Synset MappingEwa Rudnicka, Tomasz Naskret. 3260-3264 [doi]
- TRANSLIT: A Large-scale Name Transliteration ResourceFernando Benites, Gilbert François Duivesteijn, Pius von Däniken, Mark Cieliebak. 3265-3271 [doi]
- Computing with Subjectivity LexiconsCaio Libânio Melo Jerônimo, Cláudio Elízio Calazans Campelo, Leandro Balby Marinho, Allan Sales da Costa Melo, Adriano Veloso, Roberta Viola. 3272-3280 [doi]
- The ACoLi Dictionary GraphChristian Chiarcos, Christian Fäth, Maxim Ionov. 3281-3290 [doi]
- Resources in Underrepresented Languages: Building a Representative Romanian CorpusLudmila Midrigan-Ciochina, Victoria Boyd, Lucila Sanchez-Ortega, Diana Malancea_Malac, Doina Midrigan, David P. Corina. 3291-3296 [doi]
- World Class Language Technology - Developing a Language Technology Strategy for DanishSabine Kirchmeier, Bolette S. Pedersen, Sanni Nimb, Philip Diderichsen, Peter Juel Henrichsen. 3297-3301 [doi]
- A Corpus for Automatic Readability Assessment and Text Simplification of GermanAlessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa, Sarah Ebling. 3302-3311 [doi]
- The CLARIN Knowledge Centre for Atypical Communication ExpertiseHenk van den Heuvel, Nelleke Oostdijk, Caroline F. Rowland, Paul Trilsbeek. 3312-3316 [doi]
- Corpora of Disordered Speech in the Light of the GDPR: Two Use Cases from the DELAD InitiativeHenk van den Heuvel, Aleksei Kelli, Katarzyna Klessa, Satu Salaasti. 3317-3321 [doi]
- The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual EuropeGeorg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Khalid Choukri, Andrejs Vasiljevs, Gerhard Backfried, Christoph Prinz, José Manuél Gómez-Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriute, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette S. Pedersen, Inguna Skadina, Marko Tadic, Dan Tufis, Tamás Váradi, Kadri Vider, Andy Way, François Yvon. 3322-3332 [doi]
- A Framework for Shared Agreement of Language Tags beyond ISO 639Frances Gillis-Webber, Sabine Tittel. 3333-3339 [doi]
- Gigafida 2.0: The Reference Corpus of Written Standard SloveneSimon Krek, Spela Arhar Holdt, Tomaz Erjavec, Jaka Cibej, Andraz Repar, Polona Gantar, Nikola Ljubesic, Iztok Kosem, Kaja Dobrovoljc. 3340-3345 [doi]
- Corpus Query Lingua Franca part II: OntologyStefan Evert, Oleg Harlamov, Philipp Heinrich, Piotr Banski. 3346-3352 [doi]
- A CLARIN Transcription Portal for Interview DataChristoph Draxler, Henk van den Heuvel, Arjan van Hessen, Silvia Calamai, Louise Corti. 3353-3359 [doi]
- Ellogon Casual Annotation InfrastructureGeorgios Petasis, Leonidas Tsekouras. 3360-3365 [doi]
- European Language Grid: An OverviewGeorg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukás Kacena, Khalid Choukri, Victoria Arranz, Andrejs Vasiljevs, Orians Anvari, Andis Lagzdins, Julija Melnika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuél Gómez-Pérez, Andrés García-Silva, Christian Berrio, Ulrich Germann, Steve Renals, Ondrej Klejch. 3366-3380 [doi]
- The Competitiveness Analysis of the European Language Technology MarketAndrejs Vasiljevs, Inguna Skadina, Indra Samite, Kaspars Kaulins, Eriks Ajausks, Julija Melnika, Aivars Berzins. 3381-3389 [doi]
- Constructing a Bilingual Hadith Corpus Using a Segmentation ToolShatha Altammami, Eric Atwell, Ammar Alsalka. 3390-3398 [doi]
- Facilitating Corpus Usage: Making Icelandic Corpora More Accessible for Researchers and Language UsersSteinþór Steingrímsson, Starkaður Barkarson, Gunnar Thor Örnólfsson. 3399-3405 [doi]
- Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARINFranciska de Jong, Bente Maegaard, Darja Fiser, Dieter Van Uytvanck, Andreas Witt. 3406-3413 [doi]
- Language Technology Programme for Icelandic 2019-2023Anna Björk Nikulásdóttir, Jón Gudhnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 3414-3422 [doi]
- Privacy by Design and Language ResourcesPawel Kamocki, Andreas Witt. 3423-3427 [doi]
- Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language GridPenny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitris Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuél Gómez-Pérez, Andrés García-Silva. 3428-3437 [doi]
- Related Works in the Linguistic Data Consortium CatalogDaniel Jaquette, Christopher Cieri, Denise DiPersio. 3438-3442 [doi]
- Language Data Sharing in European Public Services - Overcoming Obstacles and Creating Sustainable Data Sharing InfrastructuresLilli Smal, Andrea Lösch, Josef van Genabith, Maria Giagkou, Thierry Declerck, Stephan Busemann. 3443-3448 [doi]
- A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC CommunityChristopher Cieri, James Fiumara, Stephanie M. Strassel, Jonathan Wright, Denise DiPersio, Mark Liberman. 3449-3456 [doi]
- Digital Language Infrastructures - Documenting Language ActorsVerena Lyding, Alexander König, Monica Pretti. 3457-3462 [doi]
- Samrómur: Crowd-sourcing Data Collection for Icelandic Speech RecognitionDavid Erik Mollberg, Ólafur Helgi Jónsson, Sunneva THorsteinsdóttir, Steinþór Steingrímsson, Eydís Huld Magnúsdóttir, Jon Gudnason. 3463-3467 [doi]
- Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced LanguagesAstik Biswas, Emre Yilmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler. 3468-3474 [doi]
- CLFD: A Novel Vectorization Technique and Its Application in Fake News DetectionMichail Mersinias, Stergos D. Afantenos, Georgios Chalkiadakis. 3475-3483 [doi]
- SimplifyUR: Unsupervised Lexical Text Simplification for UrduNamoos Hayat Qasmi, Haris Bin Zia, Awais Athar, Agha Ali Raza. 3484-3489 [doi]
- Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword TokenizationSangwhan Moon, Naoaki Okazaki. 3490-3497 [doi]
- Offensive Language and Hate Speech Detection for DanishGudbjartur Ingi Sigurbergsson, Leon Derczynski. 3498-3508 [doi]
- Semi-supervised Deep Embedded Clustering with Anomaly Detection for Semantic Frame InductionZheng Xin Yong, Tiago Timponi Torrent. 3509-3519 [doi]
- Search Query Language Identification Using Weak LabelingRitiz Tambi, Ajinkya Kale, Tracy Holloway King. 3520-3527 [doi]
- Automated Phonological Transcription of Akkadian Cuneiform TextAleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lindén. 3528-3534 [doi]
- COSTRA 1.0: A Dataset of Complex Sentence TransformationsPetra Barancíková, Ondrej Bojar. 3535-3541 [doi]
- Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance LearningMaria Joana Correia, Isabel Trancoso, Bhiksha Raj. 3542-3550 [doi]
- How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCRPhillip Ströbel, Simon Clematide, Martin Volk 0001. 3551-3559 [doi]
- Dirichlet-Smoothed Word Embeddings for Low-Resource SettingsJakob Jungmaier, Nora Kassner, Benjamin Roth. 3560-3565 [doi]
- On The Performance of Time-Pooling Strategies for End-to-End Spoken Language IdentificationJoão Monteiro, Md. Jahangir Alam, Tiago H. Falk. 3566-3572 [doi]
- Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich LanguagesJosé María Hoya Quecedo, Koppatz Maximilian, Roman Yangarber. 3573-3582 [doi]
- Non-Linearity in Mapping Based Cross-Lingual Word EmbeddingsJiawei Zhao, Andrew Gilman. 3583-3589 [doi]
- LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech RecognitionBenjamin Beilharz, Xin Sun, Sariya Karimova, Stefan Riezler. 3590-3594 [doi]
- SEDAR: a Large Scale French-English Financial Domain Parallel CorpusAbbas Ghaddar, Philippe Langlais. 3595-3602 [doi]
- JParaCrawl: A Large Scale Web-Based English-Japanese Parallel CorpusMakoto Morishita, Jun Suzuki, Masaaki Nagata. 3603-3609 [doi]
- Neural Machine Translation for Low-Resourced Indian LanguagesHimanshu Choudhary, Shivansh Rao, Rajesh Rohilla. 3610-3615 [doi]
- Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMTHideya Mino, Hideki Tanaka, Hitoshi Ito, Isao Goto, Ichiro Yamada, Takenobu Tokunaga. 3616-3622 [doi]
- NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic TranslationsHelena de Medeiros Caseli, Marcio Lima Inácio. 3623-3629 [doi]
- Evaluation Dataset for Zero Pronoun in Japanese to English TranslationSho Shimazu, Sho Takase, Toshiaki Nakazawa, Naoaki Okazaki. 3630-3634 [doi]
- Better Together: Modern Methods Plus Traditional Thinking in NP AlignmentÁdám Kovács, Judit Ács, András Kornai, Gábor Recski. 3635-3639 [doi]
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationHaiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi. 3640-3649 [doi]
- Being Generous with Sub-Words towards Small NMT ChildrenArne Defauw, Tom Vanallemeersch, Koen Van Winckel, Sara Szoc, Joachim Van den Bogaert. 3650-3656 [doi]
- Document Sub-structure in Neural Machine TranslationRadina Dobreva, Jie Zhou, Rachel Bawden. 3657-3667 [doi]
- An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation SystemsAlessandro Raganato, Yves Scherrer, Jörg Tiedemann. 3668-3675 [doi]
- MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors' Abstract Writing PracticeAurélie Névéol, Antonio Jimeno-Yepes, Mariana L. Neves. 3676-3682 [doi]
- JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine TranslationZhuoyuan Mao, Fabien Cromierès, Raj Dabre, Haiyue Song, Sadao Kurohashi. 3683-3691 [doi]
- A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?Julia Ive, Lucia Specia, Sara Szoc, Tom Vanallemeersch, Joachim Van den Bogaert, Eduardo Farah, Christine Maroti, Artur Ventura, Maxim Khalilov. 3692-3697 [doi]
- Linguistically Informed Hindi-English Neural Machine TranslationVikrant Goyal, Pruthwik Mishra, Dipti Misra Sharma. 3698-3703 [doi]
- A Test Set for Discourse Translation from Japanese to EnglishMasaaki Nagata, Makoto Morishita. 3704-3709 [doi]
- An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource LanguagesAaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky. 3710-3718 [doi]
- TDDC: Timely Disclosure Documents CorpusNobushige Doi, Yusuke Oda, Toshiaki Nakazawa. 3719-3726 [doi]
- MuST-Cinema: a Speech-to-Subtitles corpusAlina Karakanta, Matteo Negri, Marco Turchi. 3727-3734 [doi]
- On Context Span Needed for Machine Translation EvaluationSheila Castilho, Maja Popovic, Andy Way. 3735-3742 [doi]
- A Multilingual Parallel Corpora Collection Effort for Indian LanguagesShashank Siripragrada, Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar. 3743-3751 [doi]
- To Case or not to case: Evaluating Casing Methods for Neural Machine TranslationThierry Etchegoyhen, Harritxu Gete. 3752-3760 [doi]
- The MARCELL Legislative CorpusTamás Váradi, Svetla Koeva, Martin Yamalov, Marko Tadic, Bálint Sass, Bartlomiej Niton, Maciej Ogrodniczuk, Piotr Pezik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile Florian Pais, Dan Tufis, Radovan Garabík, Simon Krek, Andraz Repar, Matjaz Rihtar, Janez Brank. 3761-3768 [doi]
- ParaPat: The Multi-Million Sentences Parallel Corpus of Patents AbstractsFelipe Soares, Mark Stevenson, Diego Bartolomé, Anna Zaretskaya. 3769-3774 [doi]
- Corpora for Document-Level Neural Machine TranslationSiyou Liu, Xiaojun Zhang. 3775-3781 [doi]
- OpusTools and Parallel Corpus DiagnosticsMikko Aulamo, Umut Sulubacak, Sami Virpioja, Jörg Tiedemann. 3782-3789 [doi]
- Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document LevelMargot Fonteyne, Arda Tezcan, Lieve Macken. 3790-3798 [doi]
- Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine TranslationThierry Etchegoyhen, Harritxu Gete. 3799-3807 [doi]
- The FISKMÖ Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic ResearchJörg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula. 3808-3815 [doi]
- Multiword Expression aware Neural Machine TranslationAndrea Zaninello, Alexandra Birch. 3816-3825 [doi]
- An Enhanced Mapping Scheme of the Universal Part-Of-Speech for KoreanMyung Hee Kim, Nathalie Colineau. 3826-3833 [doi]
- Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and DiacritizerMaha Alkhairy, Afshan Jafri, David Smith. 3834-3841 [doi]
- An Unsupervised Method for Weighting Finite-state Morphological AnalyzersAmr Keleg, Francis M. Tyers, Nick Howell, Tommi A. Pirinen. 3842-3850 [doi]
- Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity PredictionDanushka Bollegala, Ryuichi Kiryo, Kosuke Tsujino, Haruki Yukawa. 3851-3860 [doi]
- A Supervised Part-Of-Speech Tagger for the Greek Language of the Social WebMaria Nefeli Nikiforos, Katia Lida Kermanidis. 3861-3867 [doi]
- Bag & Tag'em - A New Dutch StemmerAnne Jonker, Corné de Ruijt, Jornt de Gruijl. 3868-3876 [doi]
- Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWINabil Hathout, Franck Sajous, Basilio Calderone, Fiammetta Namer. 3877-3885 [doi]
- BabyFST - Towards a Finite-State Based Computational Model of Ancient BabylonianAleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lindén. 3886-3894 [doi]
- Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and MethodsSalam Khalifa, Nasser Zalmout, Nizar Habash. 3895-3904 [doi]
- Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional CorpusEleni Metheniti, Guenter Neumann. 3905-3912 [doi]
- Introducing a Large-Scale Dataset for Vietnamese POS Tagging on Conversational TextsOanh Tran, Tu Pham, Vu Dang, Bang Nguyen. 3913-3921 [doi]
- UniMorph 3.0: Universal MorphologyArya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ernstreits, Yuval Pinter, Cassandra L. Jacobs, Ryan Cotterell, Mans Hulden, David Yarowsky. 3922-3931 [doi]
- Building the Spanish-Croatian Parallel CorpusBojana Mikelenic, Marko Tadic. 3932-3936 [doi]
- DerivBase.Ru: a Derivational Morphology Resource for RussianDaniil Vodolazsky. 3937-3943 [doi]
- Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and PruningStig-Arne Grönroos, Sami Virpioja, Mikko Kurimo. 3944-3953 [doi]
- Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for SerbianRanka Stankovic, Branislava Sandrih, Cvetana Krstev, Milos Utvic, Mihailo Skoric. 3954-3962 [doi]
- Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand LanguagesGarrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky. 3963-3972 [doi]
- Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English CorpusMohamed Balabel, Injy Hamed, Slim Abdennadher, Ngoc Thang Vu, Özlem Çetinoglu. 3973-3977 [doi]
- Getting More Data for Low-resource Morphological Inflection: Language Models and Data AugmentationAlexey Sorokin. 3978-3983 [doi]
- Visual Modeling of Turkish MorphologyBerke Özenç, Ercan Solak. 3984-3990 [doi]
- Kvistur 2.0: a BiLSTM Compound Splitter for IcelandicJón Daðason, David Erik Mollberg, Hrafn Loftsson, Kristín Bjarnadóttir. 3991-3995 [doi]
- Morphological Segmentation for Low Resource LanguagesJustin Mott, Ann Bies, Stephanie M. Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitchell Marcus. 3996-4002 [doi]
- CCNet: Extracting High Quality Monolingual Datasets from Web Crawl DataGuillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave. 4003-4012 [doi]
- On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding LearningYerai Doval, José Camacho-Collados, Luis Espinosa Anke, Steven Schockaert. 4013-4023 [doi]
- Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation TechniquesYuming Zhai, Lufei Liu, Xinyi Zhong, Gabriel Illouz, Anne Vilnat. 4024-4033 [doi]
- Universal Dependencies v2: An Evergrowing Multilingual Treebank CollectionJoakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis M. Tyers, Daniel Zeman. 4034-4043 [doi]
- EMPAC: an English-Spanish Corpus of Institutional SubtitlesIris Serrat Roozen, José Manuel Martínez Martínez. 4044-4053 [doi]
- Cross-Lingual Word Embeddings for Turkic LanguagesElmurod Kuriyozov, Yerai Doval, Carlos Gómez-Rodríguez. 4054-4062 [doi]
- How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment DetectionHiroshi Kanayama, Ran Iwamoto. 4063-4073 [doi]
- Multilingual Culture-Independent Word Analogy DatasetsMatej Ulcar, Kristiina Vaik, Jessica Lindström, Milda Dailidenaite, Marko Robnik-Sikonja. 4074-4080 [doi]
- GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia BiographiesMarta R. Costa-Jussà, Pau Li Lin, Cristina España-Bonet. 4081-4088 [doi]
- SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and EnglishKhia A Johnson, Molly Babel, Ivan Fong, Nancy Yiu. 4089-4095 [doi]
- Identifying Cognates in English-Dutch and French-Dutch by means of Orthographic Information and Cross-lingual Word EmbeddingsEls Lefever, Sofie Labat, Pranaydeep Singh. 4096-4101 [doi]
- Lexicogrammatic translationese across two targets and competence levelsMaria Kunilovskaya, Ekaterina Lapshinova-Koltunski. 4102-4112 [doi]
- UniSent: Universal Adaptable Sentiment Lexica for 1000+ LanguagesEhsaneddin Asgari, Fabienne Braune, Benjamin Roth, Christoph Ringlstetter, Mohammad R. K. Mofrad. 4113-4120 [doi]
- CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech CorpusLi Nguyen, Christopher Bryant. 4121-4129 [doi]
- A Spelling Correction Corpus for Multiple Arabic DialectsFadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa. 4130-4138 [doi]
- A Dataset for Multi-lingual Epidemiological Event ExtractionStephen Mutuvi, Antoine Doucet, Gaël Lejeune, Moses Odeo. 4139-4144 [doi]
- Swiss-AL: A Multilingual Swiss Web Corpus for Applied LinguisticsJulia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, Maren Runte. 4145-4151 [doi]
- Analysis of GlobalPhone and Ethiopian Languages Speech Corpora for Multilingual ASRMartha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz. 4152-4156 [doi]
- Multilingualization of Medical Terminology: Semantic and Structural Embedding ApproachesLong-Huei Chen, Kyo Kageura. 4157-4166 [doi]
- Large Vocabulary Read Speech Corpora for Four Ethiopian Languages: Amharic, Tigrigna, Oromo and WolayttaSolomon Teferra Abate, Martha Yifiru Tachbelie, Michael Melese, Hafte Abera, Tewodros Abebe, Wondwossen Mulugeta, Yaregal Assabie, Million Meshesha, Solomon Afnafu, Binyam Ephrem Seyoum. 4167-4171 [doi]
- Incorporating Politeness across Languages in Customer Care Responses: Towards building a Multi-lingual Empathetic Dialogue AgentMauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya. 4172-4182 [doi]
- WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic ParsingCezar Sas, Meriem Beloucif, Anders Søgaard. 4183-4189 [doi]
- Multilingual Corpus Creation for Multilingual Semantic Similarity TaskMahtab Ahmed, Chahna Dixit, Robert E. Mercer, Atif Khan, Muhammad Rifayat Samee, Felipe Urra. 4190-4196 [doi]
- CoVoST: A Diverse Multilingual Speech-To-Text Translation CorpusChanghan Wang, Juan Miguel Pino, Anne Wu, Jiatao Gu. 4197-4203 [doi]
- A Visually-Grounded Parallel Corpus with Phrase-to-Region LinkingHideki Nakayama, Akihiro Tamura, Takashi Ninomiya. 4204-4210 [doi]
- Multilingual Dictionary Based Construction of Core VocabularyWinston Wu, Garrett Nicolai, David Yarowsky. 4211-4217 [doi]
- Common Voice: A Massively-Multilingual Speech CorpusRosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis M. Tyers, Gregor Weber. 4218-4222 [doi]
- Massively Multilingual Pronunciation Modeling with WikiPronJackson L. Lee, Lucas F. E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman. 4223-4228 [doi]
- HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme AlignmentAnssi Yli-Jyrä, Josi Purhonen, Matti Liljeqvist, Arto Antturi, Pekka Nieminen, Kari M. Räntilä, Valtter Luoto. 4229-4236 [doi]
- ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-EnglishInjy Hamed, Ngoc Thang Vu, Slim Abdennadher. 4237-4246 [doi]
- Cross-lingual Named Entity List Search via TransliterationAleksandr Khakhmovich, Svetlana V. Pavlova, Kira Kirillova, Nikolay Arefyev, Ekaterina Savilova. 4247-4255 [doi]
- Serial Speakers: a Dataset of TV SeriesXavier Bost, Vincent Labatut, Georges Linarès. 4256-4264 [doi]
- Image Position Prediction in Multimodal DocumentsMasayasu Muraoka, Ryosuke Kohita, Etsuko Ishii. 4265-4274 [doi]
- Visual Grounding Annotation of Recipe Flow GraphTaichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, Shinsuke Mori. 4275-4284 [doi]
- Building a Multimodal Entity Linking Dataset From TweetsOmar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, Brigitte Grau. 4285-4292 [doi]
- A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case StudySalima Mdhaffar, Yannick Estève, Antoine Laurent, Nicolas Hernandez, Richard Dufour, Delphine Charlet, Géraldine Damnati, Solen Quiniou, Nathalie Camelin. 4293-4301 [doi]
- Annotating Event Appearance for Japanese Chess Commentary CorpusHirotaka Kameko, Shinsuke Mori. 4302-4308 [doi]
- Offensive Video Detection: Dataset and Baseline ResultsCleber Alcântara, Viviane Pereira Moreira, Diego de Vargas Feijó. 4309-4319 [doi]
- Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political InterviewsDaniela Trotta, Alessio Palmero Aprosio, Sara Tonelli, Annibale Elia. 4320-4326 [doi]
- E: Calm Resource: a Resource for Studying Texts Produced by French Pupils and StudentsLydia-Mai Ho-Dac, Serge Fleury, Claude Ponton. 4327-4332 [doi]
- Introducing MULAI: A Multimodal Database of Laughter during Dyadic InteractionsMichel-Pierre Jansen, Khiet P. Truong, Dirk K. J. Heylen, Deniece S. Nazareth. 4333-4342 [doi]
- The Connection between the Text and Images of News Articles: New Insights for Multimedia AnalysisNelleke Oostdijk, Hans van Halteren, Erkan Basar, Martha A. Larson. 4343-4351 [doi]
- LifeQA: A Real-life Dataset for Video Question AnsweringSantiago Castro, Mahmoud Azab, Jonathan C. Stroud, Cristina Noujaim, Ruoyao Wang, Jia Deng, Rada Mihalcea. 4352-4358 [doi]
- A Domain-Specific Dataset of Difficulty Ratings for German Noun Compounds in the Domains DIY, Cooking and AutomotiveJulia Bettinger, Anna Hätty, Michael Dorna, Sabine Schulte im Walde. 4359-4367 [doi]
- All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for GermanYana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard W. Hinrichs, Daniela Rossmann. 4368-4378 [doi]
- Variants of Vector Space Reductions for Predicting the Compositionality of English Noun CompoundsPegah Alipoor, Sabine Schulte im Walde. 4379-4387 [doi]
- Varying Vector Representations and Integrating Meaning Shifts into a PageRank Model for Automatic Term ExtractionAnurag Nigam, Anna Hätty, Sabine Schulte im Walde. 4388-4394 [doi]
- Rigor Mortis: Annotating MWEs with a Gamified PlatformKarën Fort, Bruno Guillaume, Yann-Alan Pilatte, Mathieu Constant, Nicolas Lefebvre. 4395-4401 [doi]
- A Multi-word Expression Dataset for SwedishMurathan Kurfali, Robert Östling, Johan Sjons, Mats Wirén. 4402-4409 [doi]
- A Joint Approach to Compound Splitting and Idiomatic Compound DetectionIrina Krotova, Sergey Aksenov, Ekaterina Artemova. 4410-4417 [doi]
- Dedicated Language Resources for Interdisciplinary Research on Multiword Expressions: Best Thing since Sliced BreadFerdy Hubers, Catia Cucchiarini, Helmer Strik. 4418-4425 [doi]
- Detecting Multiword Expression Type Helps Lexical Complexity AssessmentEkaterina Kochmar, Sian Gooding, Matthew Shardlow. 4426-4435 [doi]
- Introducing RONEC - the Romanian Named Entity CorpusStefan Daniel Dumitrescu, Andrei-Marius Avram. 4436-4443 [doi]
- A Semi-supervised Approach for De-identification of Swedish Clinical TextHanna Berg, Hercules Dalianis. 4444-4450 [doi]
- A Chinese Corpus for Fine-grained Entity TypingChin Lee, Hongliang Dai, Yangqiu Song, Xin Li. 4451-4457 [doi]
- Czech Historical Named Entity Corpus v 1.0Helena Hubková, Pavel Král, Eva Pettersson. 4458-4465 [doi]
- CodE Alltag 2.0 - A Pseudonymized German-Language Email CorpusElisabeth Eder, Ulrike Krieg-Holz, Udo Hahn. 4466-4477 [doi]
- A Dataset of German Legal Documents for Named Entity RecognitionElena Leitner, Georg Rehm, Julián Moreno Schneider. 4478-4485 [doi]
- Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERTAitor García Pablos, Naiara Pérez, Montse Cuadros. 4486-4494 [doi]
- Named Entities in Medical Case Reports: Corpus and Experiments