Abstract is missing.
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)Vera Demberg, Kentaro Inui, Lluís Màrquez. [doi]
- Investigating the Multilingual Calibration Effects of Language Model Instruction TuningJerry Huang, Peng Lu, Qiuhao Zeng, Yusuke Iwasawa, Yutaka Matsuo, Sarath Chandar, Edison Marrese-Taylor, Irene Li. 1-59 [doi]
- Detecting Subtle Sense Shift with Polysemy-Aware TrendsOndrej Herman, Pavel Rychlý. 60-65 [doi]
- Logic Haystacks: Probing LLMs' Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)Damien Sileo. 66-75 [doi]
- When Does Auxiliary Modality Matter in Solving Geometric Problems? A Comprehensive Study of Textual, Formal, and Visual ModalitiesHyuk Namgoong, Jeesu Jung, Yerim Han, Sangkeun Jung. 76-92 [doi]
- Progressive Visual Refinement for Multi-modal SummarizationYe Xiong, Hidetaka Kamigaito, Soichiro Murakami, Peinan Zhang, Hiroya Takamura, Manabu Okumura. 93-104 [doi]
- Benchmarking Temporal Reasoning and Alignment Across Chinese DynastiesZhenglin Wang, Jialong Wu 0007, Pengfei Li, Yong Jiang 0005, Deyu Zhou. 105-120 [doi]
- Training in Step-by-Step Formal Reasoning Improves Pronominal Reasoning in Language ModelsVagrant Gautam. 121-135 [doi]
- When Words Wear Masks: Detecting Malicious Intents and Hostile Impacts of Online Hate SpeechPriyansh Singhal, Piyush Joshi. 136-153 [doi]
- Lost in Activations: A Neuron-level Analysis of Encoders for Cross-Lingual Emotion DetectionPranaydeep Singh, Orphée De Clercq, Els Lefever. 154-159 [doi]
- McMining: Automated Discovery of Misconceptions in Student CodeErfan Al-Hossami, Razvan C. Bunescu. 160-178 [doi]
- Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More PoorlyYi-Chien Lin, William Schuler. 179-186 [doi]
- WebRollback: Enhancing Web Agents with Explicit Rollback MechanismsZhisong Zhang, Tianqing Fang, Kaixin Ma, Wenhao Yu 0002, Hongming Zhang 0009, Haitao Mi, Dong Yu 0001. 187-197 [doi]
- Hacking Neural Evaluation Metrics with Single TextHiroyuki Deguchi 0002, Katsuki Chousa, Yusuke Sakai 0010. 198-206 [doi]
- To Paraphrase or Not: Efficient Comment Detoxification with Unsupervised Detoxifiability DiscriminationJing Ke, Zheyong Xie, Shaosheng Cao, Tong Xu 0001, Enhong Chen. 207-213 [doi]
- Hey, wait a minute: on at-issue sensitivity in Language ModelsSanghee J. Kim, Kanishka Misra. 214-224 [doi]
- Exploring Cross-Lingual Voice Conversion Methods for Anonymizing Low-Resource Text-to-SpeechShenran Wang, Aidan Pine, Mengzhe Geng. 225-231 [doi]
- Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning CapabilitiesHongseok Oh, Wonseok Hwang, Kyoung-woon On. 232-243 [doi]
- Task-Level Instructions Induction for Audio Question Answering from Few ExamplesPo-Chun Chen, Hen-Hsen Huang, Hsin-Hsi Chen. 244-264 [doi]
- Optical Character Recognition for the International Phonetic AlphabetShu Okabe, Dejvi Zelo, Alexander Fraser 0001. 265-273 [doi]
- How DDAIR you? Disambiguated Data Augmentation for Intent RecognitionGalo Castillo-López, Alexis Lombard, Nasredine Semmar, Gaël de Chalendar. 274-286 [doi]
- Measuring Linguistic Competence of LLMs on Indigenous Languages of the AmericasJustin Vasselli, Arturo Mp, Frederikus Hudi, Haruki Sakajo, Taro Watanabe. 287-296 [doi]
- Morpheme Matters: Morpheme-Based Subword Tokenization for Korean Language ModelsDonghyeok Lee 0002, Jeongyeon Park, Kyungbeen Cho, Jae Sung Lee. 297-306 [doi]
- Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based ApproachesHachem Madmoun, Salem Lahlou. 307-321 [doi]
- Mind Your Special Tokens! On the Importance of Dedicated Sequence-End Tokens in Vision-Language Embedding ModelsElio Musacchio, Giovanni Semeraro, Goran Glavas. 322-328 [doi]
- The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AIAlan Saji, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully. 329-344 [doi]
- When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation EvaluationDavid Tan, Pinzhen Chen, Josef van Genabith, Koel Dutta Chowdhury. 345-358 [doi]
- SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African ContextAishwarya Verma, Laud Ammah, Olivia Nercy Ndlovu Lucas, Andrew Zaldivar, Vinodkumar Prabhakaran, Sunipa Dev. 359-370 [doi]
- STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese LanguageHongyi Li, Jianjun Lian, Anton Frederik Thielmann, Andre Python. 371-383 [doi]
- MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption AlignmentSagarika Banerjee, Tangatar Madi, Advait Swaminathan, Jolie Nguyen, Shivank Garg, Kevin Zhu, Vasu Sharma. 384-406 [doi]
- Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question AnsweringLei Tang, Wei Zhou, Mohsen Mesgar. 407-420 [doi]
- Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning ModelsAlexey Dontsov, Anton Korznikov, Andrey V. Galichin, Elena Tutubalina. 421-435 [doi]
- Learning to Ideate for Machine Learning Engineering AgentsYunxiang Zhang, Kang Zhou, Zhichao Xu 0001, Kiran Ramnath, Yun Zhou, Sangmin Woo, Haibo Ding, Lin Lee Cheong. 436-447 [doi]
- Infinity-MoE: Generalizing Mixture of Experts to Infinite ExpertsShota Takashiro, Takeshi Kojima, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo. 448-456 [doi]
- Beyond Tokens: Concept-Level Training Objectives for LLMsLaya Iyer, Pranav Somani, Alice Guo, Dan Jurafsky, Chen Shani. 457-474 [doi]
- Persuasion Tokens for Editing Factual Knowledge in LLMsPaul Youssef, Jörg Schlötterer, Christin Seifert. 475-486 [doi]
- Language Family Matters: Evaluating SpeechLLMs Across Linguistic BoundariesYuchen Zhang, Ravi Shekhar, Haralambos Mouratidis. 487-499 [doi]
- When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality EvaluationXunyi Jiang, Dingyi Chang, Julian J. McAuley, Xin Xu 0010. 500-512 [doi]
- On the Additive Compositionality of Task Vectors in Vision-Language ModelsYuting Shi, Houjing Wei, Naoya Inoue. 513-521 [doi]
- Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMsArya Labroo, Ivaxi Sheth, Vyas Raina, Amaani Ahmed, Mario Fritz. 522-554 [doi]
- Common Sense or Ableism? Rethinking Commonsense Reasoning Through the Lens of DisabilityKarina Halevy, Kimi Wenzel, Seyun Kim, Kyle Dean Bauer, Bruno Neira, Mona T. Diab, Maarten Sap. 555-571 [doi]
- Machine translation Evaluation Eng-Thai MQM Ranking datasetPhichet Phuangrot, Natdanai Trintawat, Kanawat Vilasri, Yanapat Patcharawiwatpong, Pachara Boonsarngsuk, Nat Pavasant, Ekapol Chuangsuwanich. 572-587 [doi]
- Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection TaskHirohiko Abe, Kentaro Ozeki, Risako Ando, Takanobu Morishita, Koji Mineshima, Mitsuhiro Okada 0001. 588-601 [doi]
- Confidence Leaps in LLM Reasoning: Early Stopping and Cross-Model TransferPavel Tikhonov, Ivan V. Oseledets, Elena Tutubalina. 602-616 [doi]
- Exploring Fine-Tuning for In-Context Retrieval and Efficient KV-Caching in Long-Context Language ModelsFrancesco Maria Molfese, Momchil Hardalov, Rexhina Blloshmi, Bill Byrne, Adrià de Gispert. 617-635 [doi]
- Post-ASR Correction in Hindi: Comparing Language Models and Large Language Models in Low-Resource ScenariosRishabh Kumar, Amrith Krishna, Ganesh Ramakrishnan, Preethi Jyothi. 636-645 [doi]
- CHiRPE: A Step Towards Real-World Clinical NLP with Clinician-Oriented Model ExplanationsStephanie Fong, Zimu Wang, Guilherme C. Oliveira, Xiangyu Zhao, Yiwen Jiang, Jiahe Liu, Beau-Luke Colton, Scott W. Woods, Martha Shenton, Barnaby Nelson, ZongYuan Ge, Dominic Dwyer. 646-658 [doi]
- LLMs Know More About Numbers than They Can SayFengting Yuchi, Li Du, Jason Eisner. 659-673 [doi]
- On the Mathematical Relationship Between Layer Normalization and Dynamic Activation FunctionsFelix Stollenwerk. 674-681 [doi]
- Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language ModelsQi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa. 682-696 [doi]
- Becoming Experienced Judges: Selective Test-Time Learning for EvaluatorsSeungyeon Jwa, Daechul Ahn, Reokyoung Kim, Dongyeop Kang, Jonghyun Choi. 697-721 [doi]
- Statistical Foundations of DIME: Risk Estimation for Practical Index SelectionGiulio D'Erasmo, Cesare Campagnano, Antonio Mallia, Pierpaolo Brutti, Nicola Tonellotto, Fabrizio Silvestri. 722-730 [doi]