Abstract is missing.
- LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic ExpertsYang Liu, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Lingyong Yan. 1-22 [doi]
- Teams of LLM Agents can Exploit Zero-Day VulnerabilitiesYuxuan Zhu 0003, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, Daniel Kang 0001. 23-35 [doi]
- Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?Jingwei Ni, Yu Fan 0007, Vilém Zouhar, Donya Rooein, Alexander Miserlis Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Elliott Ash. 36-54 [doi]
- Early-Exit and Instant Confidence Translation Quality EstimationVilém Zouhar, Maike Züfle, Beni Egressy, Julius Cheng, Mrinmaya Sachan, Jan Niehues. 55-76 [doi]
- GRITHopper: Decomposition-Free Multi-Hop Dense RetrievalJustus-Jonas Erker, Nils Reimers 0001, Iryna Gurevych. 77-94 [doi]
- SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language ModelsGyubeum Lim, Yemo Koo, Vijay Krishna Madisetti. 95-140 [doi]
- Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data SynthesisShuhaib Mehri, Xiusi Chen, Heng Ji 0001, Dilek Hakkani-Tür. 141-164 [doi]
- T2-RAGBench: Text-and-Table Aware Retrieval-Augmented GenerationJan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann. 165-191 [doi]
- The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language ModelsKefan Yu, Qingcheng Zeng, Weihao Xuan, Wanxin Li, Jingyi Wu, Rob Voigt. 192-213 [doi]
- Hierarchical Text Classification with LLM-Refined TaxonomiesJonas Golde, Nicolaas Paul Jedema, RaviKiran Krishnan, Phong Le. 214-228 [doi]
- Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context LearningChengsong Huang, Langlin Huang, Jiaxin Huang 0001. 229-249 [doi]
- Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language ModelsSarah Ball, Frauke Kreuter, Nina Panickssery. 250-279 [doi]
- Out of Style: RAG's Fragility to Linguistic VariationTianyu Cao 0006, Neel Bhandari, Akhila Yerukola, Akari Asai, Maarten Sap. 280-318 [doi]
- Do Political Opinions Transfer Between Western Languages? An Analysis of Unaligned and Aligned Multilingual LLMsFranziska Weeber, Tanise Ceron, Sebastian Padó. 319-340 [doi]
- H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM AgentsHaoran Sun, Shaoning Zeng, Bob Zhang. 341-350 [doi]
- MULSUM: A Multimodal Summarization System with Vis-Aligner and Diversity-Aware Image SelectionAbid Ali, Diego Mollá, Usman Naseem. 351-362 [doi]
- How Quantization Shapes Bias in Large Language ModelsFederico Marcuzzi, Xuefei Ning, Roy Schwartz 0001, Iryna Gurevych. 363-404 [doi]
- If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language ModelsJasmin Orth, Philipp Mondorf, Barbara Plank. 405-427 [doi]
- The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for ShortcutsSangmitra Madhusudan, Kaige Chen, Ali Emami. 428-453 [doi]
- Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model EvaluationAlperen Ozturk, Saziye Betül Özates, Sophia Bahar Root, Angela Violi, Nicholas A. Kotov, J. Scott Vanepps, Emine Sumeyra Turali-Emre. 454-465 [doi]
- Intention Knowledge Graph Construction for User Intention Relation ModelingJiaxin Bai, Zhaobo Wang, Junfei Cheng, Dan Yu, Zerui Huang, Weiqi Wang 0001, Xin Liu, Chen Luo 0003, Yanming Zhu, Bo Li 0005, Yangqiu Song. 466-484 [doi]
- Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule InductionChunyang Jiang, Paola Merlo. 485-500 [doi]
- JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human risky health behavior Content in Jirai CommunityYunze Xiao, Tingyu He, Lionel Z. Wang, Yiming Ma, Xingyu Song, Xiaohang Xu 0002, Mona T. Diab, Irene Li, Ka Chung Ng. 501-517 [doi]
- Chandomitra: Towards Generating Structured Sanskrit Poetry from Natural Language InputsManoj Balaji Jagadeeshan, Samarth Bhatia, Pretam Ray, Harshul Raj Surana, Akhil Rajeev P, Priya Mishra, Annarao Kulkarni, Ganesh Ramakrishnan, Prathosh A. P., Pawan Goyal 0002. 518-534 [doi]
- Tailored Emotional LLM-Supporter: Enhancing Cultural SensitivityChen Cecilia Liu, Hiba Arnaout, Nils Kovacic, Dana Atzil-Slonim, Iryna Gurevych. 535-574 [doi]
- Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge GraphsHussein Abdallah, Ibrahim Abdelaziz, Panos Kalnis, Essam Mansour 0001. 575-592 [doi]
- Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language ModelsDavid Guzman Piedrahita, Irene Strauss, Rada Mihalcea, Zhijing Jin 0001. 593-652 [doi]
- PromptFE: Automated Feature Engineering by PromptingYufeng Zou, Jean Utke, Diego Klabjan, Han Liu 0001. 653-681 [doi]
- Detecting (Un)answerability in Large Language Models with Linear DirectionsMaor Juliet Lavi, Tova Milo, Mor Geva. 682-699 [doi]
- Online Difficulty Filtering for Reasoning Oriented Reinforcement LearningSanghwan Bae, Jiwoo Hong, Min-Young Lee, Hanbyul Kim, JeongYeon Nam, Donghyun Kwak. 700-719 [doi]
- BERT, are you paying attention? Attention regularization with human-annotated rationalesElize Herrewijnen, Dong Nguyen 0002, Floris Bex, Albert Gatt. 720-751 [doi]
- Humans and transformer LMs: Abstraction drives language learningJasper Jian, Christopher D. Manning. 752-765 [doi]
- BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTokMinh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross M. Sonnenblick, Magdalayna Curry, Laura D'Adamo, Lindsay Young, Stuart B. Murray, Kristina Lerman. 766-790 [doi]
- Do language models accommodate their users? A study of linguistic convergenceTerra Blevins, Susanne Sophie Schmalwieser, Benjamin Roth 0001. 791-807 [doi]
- Auditing Language Model Unlearning via Information DecompositionAnmol Goel, Alan Ritter, Iryna Gurevych. 808-826 [doi]
- OD-Stega: LLM-Based Relatively Secure Steganography via Optimized DistributionsYu-Shin Huang, Peter Just, Hanyun Yin, Krishna Narayanan 0001, Ruihong Huang, Chao Tian 0002. 827-851 [doi]
- Sparse Adapter Fusion for Continual Learning in NLPMin Zeng, Xi Chen, Haiqin Yang, Yike Guo. 852-863 [doi]
- Rethinking Prompt Optimizers: From Prompt Merits to OptimizationZixiao Zhu, Hanzhang Zhou, Zijian Feng, Tianjiao Li, Chua Jia Jim Deryl, Lee Onn Mak, Gee Wah Ng, Kezhi Mao. 864-892 [doi]
- A Survey on Multilingual Mental Disorders Detection from Social Media DataAna-Maria Bucur, Marcos Zampieri, Tharindu Ranasinghe, Fabio Crestani. 893-918 [doi]
- Identifying Fine-grained Forms of Populism in Political Discourse: A Case Study on Donald Trump's Presidential CampaignsIlias Chalkidis, Stephanie Brandl, Paris Aslanidis. 919-936 [doi]
- SCoNE: a Self-Correcting and Noise-Augmented Method for Complex Biological and Chemical Named Entity RecognitionXingyu Zhu, Claire Nédellec, Balázs Nagy 0004, László Vidács, Robert Bossy. 937-952 [doi]
- A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language ModelsIwona Christop, Mateusz Czyznikiewicz, Pawel Skórzewski, Lukasz Bondaruk, Jakub Kubiak, Marcin Lewandowski, Marek Kubis. 953-983 [doi]
- Nemotron-CrossThink: Scaling Self-Learning beyond Math ReasoningSyeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han 0002, Ying Lin, Evelina Bakhturina, Eric Nyberg, Yejin Choi 0001, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro. 984-1002 [doi]
- Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and SafeguardsAleksandra Krasnodebska, Katarzyna Dziewulska, Karolina Seweryn, Maciej Chrabaszcz, Wojciech Kusa. 1003-1034 [doi]
- InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and ReflectionYuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang 0001, Xiaotian Han, Hongxia Yang, Fei Wu 0001. 1035-1051 [doi]
- Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for TurkishYakup Abrek Er, Ilker Kesen, Gözde Gül Sahin, Aykut Erdem. 1052-1085 [doi]
- CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense DifferentiationBastien Liétard, Gabriel Loiseau. 1086-1100 [doi]
- Do NOT Classify and Count: Hybrid Attribute Control Success EvaluationFelix Matthias Saaro, Pius von Däniken, Mark Cieliebak, Jan Milan Deriu. 1101-1114 [doi]
- Detecting Training Data of Large Language Models via Expectation MaximizationGyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, William Yang Wang. 1115-1129 [doi]
- How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?Pritam Sil, Durgaprasad Karnam, Vinay Reddy Venumuddala, Pushpak Bhattacharyya. 1130-1140 [doi]
- Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical RealismSimon Münker, Nils Schwager, Achim Rettinger. 1141-1151 [doi]
- Persona Prompting as a Lens on LLM Social ReasoningJing Yang, Moritz Hechtbauer, Elisabeth Khalilov, Evelyn Luise Brinkmann, Vera Schmitt, Nils Feldhus. 1152-1170 [doi]
- PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European MediaMichele Joshua Maggini, Paloma Piot, Anxo Pérez, Erik Bran Marino, Lúa Santamaría Montesinos, Ana Lisboa Cotovio, Marta Vázquez Abuín, Javier Parapar, Pablo Gamallo 0001. 1171-1186 [doi]
- Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver CompositionLei Xu, Pierre Beckmann, Marco Valentino, André Freitas. 1187-1208 [doi]
- Lexical Popularity: Quantifying the Impact of Pre-training for LLM PerformanceElena Sofia Ruzzetti, Fabio Massimo Zanzotto, Tommaso Caselli. 1209-1230 [doi]
- Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic VerificationPaul He 0001, Yinya Huang, Mrinmaya Sachan, Zhijing Jin 0001. 1231-1250 [doi]
- CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic PressuresPunya Syon Pandey, Yongjin Yang, Jiarui Liu 0004, Zhijing Jin 0001. 1251-1266 [doi]
- Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generationThomas F. Burns, Letitia Parcalabescu, Stephan Wäldchen, Michael Barlow, Gregor Ziegltrum, Volker Stampa, Bastian Harren, Björn Deiseroth. 1267-1283 [doi]
- Ultra-Low-Dimensional Prompt Tuning via Random ProjectionZijun Wu 0002, Yongchang Hao, Lili Mou. 1284-1303 [doi]
- NP-Hard Lower Bound Complexity for Semantic Self-VerificationRobin Young. 1304-1318 [doi]
- STAMP: Selective Task-Aware Mechanism for Text PrivacyFengwei Tian, Payel Bhattacharjee, Heidi A. Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon. 1319-1333 [doi]
- Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance AbilitiesAlberto Purpura, Li Wang, Sahil Badyal, Eugenio Beaufrand, Adam Faulkner. 1334-1349 [doi]
- Utterance-level Detection Framework for LLM-Involved Content Detection in Conversational SettingMuyang Zhou, Huaxia Rui. 1350-1366 [doi]
- Patient-Similarity Cohort Reasoning in Clinical Text-to-SQLYifei Shen, Yilun Zhao 0001, Justice Ou, Tinglin Huang 0001, Arman Cohan. 1367-1412 [doi]
- iBERT: Interpretable Embeddings via Sense DecompositionVishal Anand 0002, Milad Alshomary, Kathleen McKeown. 1413-1429 [doi]
- Attacker's Noise Can Manipulate Your Audio-based LLM in the Real WorldVinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang. 1430-1440 [doi]
- Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing FrameworkCléa Chataigner, Rebecca Ma, Prakhar Ganesh, Yuhao Chen, Afaf Taïk, Elliot Creager, Golnoosh Farnadi. 1441-1467 [doi]
- AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query GenerationShuai Wang 0032, Harrisen Scells, Bevan Koopman, Guido Zuccon. 1468-1493 [doi]
- Improving LLM Domain Certification with Pretrained Guide ModelsJiaqian Zhang, Zhaozhi Qian, Faroq Al-Tam, Ignacio Iacobacci, Muhammad Al-Qurishi, Riad Souissi. 1494-1510 [doi]
- TDFlow: Agentic Workflows for Test Driven DevelopmentKevin Han, Siddharth Maddikayala, Tim Knappe, Om Patel, Austen Liao, Amir Barati Farimani. 1511-1527 [doi]
- Contrastive Learning with Narrative Twins for Modeling Story SalienceIgor Sterner, Alex Lascarides, Frank Keller. 1528-1550 [doi]
- ExAnte: A Benchmark for Ex-Ante Inference in Large Language ModelsYachuan Liu, Xiaochun Wei, Lin Shi, Xinnuo Li, Bohan Zhang, Paramveer Dhillon, Qiaozhu Mei. 1551-1571 [doi]
- CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk DetectionGrace Byun, Rebecca Lipschutz, Sean T. Minton, Abigail Powers, Jinho D. Choi. 1572-1590 [doi]
- Coordinates from Context: Using LLMs to Ground Complex Location ReferencesTessa Masis, Brendan T. O'Connor 0001. 1591-1606 [doi]
- Discourse Graph Guided Document Translation with Large Language ModelsViet-Thanh Pham, Minghan Wang, Hao-Han Liao, Thuy-Trang Vu. 1607-1627 [doi]
- StarFlow: Generating Structured Workflow Outputs From Sketch ImagesPatrice Bechard, Chao Wang, Amirhossein Abaskohi, Juan A. Rodríguez, Christopher Pal, David Vázquez 0001, Spandana Gella, Sai Rajeswar, Perouz Taslakian. 1628-1645 [doi]
- Adaptive Helpfulness-Harmlessness Alignment with Preference VectorsRen-Wei Liang, Chin-Ting Hsu, Chan-Hung Yu, Saransh Agrawal, Shih-Cheng Huang, Chieh-Yen Lin, Shang-Tse Chen, Kuan-Hao Huang, Shao-Hua Sun. 1646-1668 [doi]
- How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes DomainsReza Khanmohammadi, Erfan Miahi, Simerjot Kaur, Charese Smiley, Ivan Brugere, Kundan Thind, Mohammad M. Ghassemi. 1669-1754 [doi]
- SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search EngineHoang-Quoc Nguyen-Son, Minh-Son Dao, Koji Zettsu. 1755-1772 [doi]
- RoZO: Geometry-Aware Zeroth-Order Fine-Tuning on Low-Rank Adapters for Black-Box Large Language ModelsZichen Song, Weijia Li. 1773-1783 [doi]
- Mitigating Degree Bias in Hypergraphs via Attribute-as-Structure ApproachRyusei Nishide, Makoto Miwa. 1784-1801 [doi]
- Generative Personality Simulation via Theory-Informed Structured InterviewPengda Wang 0004, Huiqi Zou, Han Jiang 0007, Hanjie Chen, Tianjun Sun, Xiaoyuan Yi, Ziang Xiao, Frederick L. Oswald. 1802-1888 [doi]
- Unraveling LLM Jailbreaks Through Safety Knowledge NeuronsChongwen Zhao, Yutong Ke, Kaizhu Huang. 1889-1906 [doi]
- ELLA: Efficient Lifelong Learning for Adapters in Large Language ModelsShristi Das Biswas, Yue Zhang, Anwesan Pal, Radhika Bhargava, Kaushik Roy. 1907-1924 [doi]
- LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law MaskingMohamed Elgaar, Hadi Amiri. 1925-1942 [doi]
- RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph CompletionÖmer Faruk Akgül, Feiyu Zhu, Yuxin Yang 0010, Rajgopal Kannan, Viktor K. Prasanna. 1943-1965 [doi]
- Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and BandwidthMichelle Yuan, Weiyi Sun, Amir Rezaeian, Jyotika Singh, Sandip Ghoshal, Yao-Ting Wang, Miguel Ballesteros, Yassine Benajiba. 1966-1978 [doi]
- PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVRJames Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy. 1979-1997 [doi]
- Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social SimulationBolei Ma, Yong Cao 0001, Indira Sen, Anna-Carolina Haensch, Frauke Kreuter, Barbara Plank, Daniel Hershcovich. 1998-2016 [doi]
- Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graph for Retrieval-Augmented GenerationZe-yu Zhang, Zitao Li, Yaliang Li, Bolin Ding, Bryan Kian Hsiang Low. 2017-2054 [doi]
- Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?Kai Sun 0006, Yin Huang, Srishti Mehra, Mohammad Kachuee, Xilun Chen 0002, Renjie Tao, Zhaojiang Lin, Andrea Jessee, Nirav Shah, Alex Betty, Yue Liu, Anuj Kumar, Wen-tau Yih, Xin Luna Dong. 2055-2074 [doi]
- A Computational Approach to Visual MetonymySaptarshi Ghosh, Linfeng Liu, Tianyu Jiang. 2075-2099 [doi]
- A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-ArabicJuan Moreno Gonzalez, Bashar Alhafni, Nizar Habash. 2100-2113 [doi]
- Multimodal Evaluation of Russian-language ArchitecturesArtem Chervyakov, Ulyana Isaeva, Anton A. Emelyanov, Artem Safin, Maria Tikhonova, Alexander Kharitonov, Yulia Lyakh, Petr Surovtsev, Denis Shevelev, Vildan Saburov, Vasily Konovalov, Elisei Rykov, Ivan Sviridov, Amina Miftakhova, Ilseyar Alimova, Alexander Panchenko, Alexander Kapitanov, Alena Fenogenova. 2114-2161 [doi]
- Don't Judge a Book by its Cover: Testing LLMs' Robustness Under Logical ObfuscationAbhilekh Borah, Shubhra Ghosh, Kedar Joshi, Aditya Kumar Guru, Kripabandhu Ghosh. 2162-2180 [doi]
- I know you are different! Towards Persona Driven Knowledge-infused Dialogue AssistantShifali Agrahari, Moushumi Mahato, Abhisek Tiwari, Javaid Nabi. 2181-2205 [doi]
- Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned ReasoningQifan Yu, Zhenyu He 0012, Sijie Li, Zhou Xun, Jun Zhang, Jingjing Xu, Di He 0001. 2206-2222 [doi]
- Layer-wise Swapping for Generalizable Multilingual SafetyHyunseo Shin, Wonseok Hwang. 2223-2238 [doi]
- Measuring Idiomaticity in Text Embedding Models with epsilon-compositionalitySondre Wold, Étienne Simon, Erik Velldal, Lilja Øvrelid. 2239-2252 [doi]
- Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based OptimizersAndrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang 0001, Jack W. Stokes. 2253-2272 [doi]
- MAViS: A Multi-Agent Framework for Long-Sequence Video StorytellingQian Wang, Ziqi Huang, Ruoxi Jia 0001, Paul Debevec, Ning Yu 0006. 2273-2295 [doi]
- Computational Benchmarks for Egyptian Arabic Child Directed SpeechSalam Khalifa, Abdelrahim Qaddoumi, Nizar Habash, Owen Rambow. 2296-2307 [doi]
- K-LegalDeID: A Benchmark Dataset and KLUEBERT-CRF for De-identification in Korean Court JudgmentsWooseok Choi, Hyungbin Kim, Yon Dohn Chung. 2308-2325 [doi]
- Specialization through Collaboration: Understanding Expert Interaction in Mixture-of-Expert Large Language ModelsYuanbo Tang, Naifan Zhang, Yan Tang, Meixuan Chen, Shuhan Huang, Tingyu Cao, Yang Li. 2326-2339 [doi]
- Compact Language Models with Iterative Text Refinement for Health Dialogue SummarizationKellen Tan Cheng, Ganesh Ramesh, Nafiul Rashid, Geoffrey J. Tso, Jilong Kuang. 2340-2363 [doi]
- Mind the Gap: Benchmarking LLM Uncertainty and Calibration with Specialty-Aware Clinical QA and Reasoning-Based Behavioural FeaturesAlberto Testoni, Iacer Calixto. 2364-2382 [doi]
- Controlling Reading Ease with Gaze-Guided Text GenerationAndreas Säuberli, Darja Jepifanova, Diego Frassinelli, Barbara Plank. 2383-2397 [doi]
- PictureStories: Predicting the Task Adherence of Language Learner Answers to a Picture Story-Based Writing TaskMarie Bexte, Andrew Caines, Diane Nicholls, Paula Buttery, Torsten Zesch. 2398-2415 [doi]
- Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language ModelsVitalii Hirak, Jaap Jumelet, Arianna Bisazza. 2416-2434 [doi]
- Large Language Models as Oracles for Ontology AlignmentSviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jiménez-Ruiz, Artur d'Avila Garcez. 2435-2449 [doi]
- Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMsRahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, James Zou 0001. 2450-2483 [doi]
- Effective QA-Driven Annotation of Predicate-Argument Relations Across LanguagesJonathan Davidov, Aviv Slobodkin, Shmuel Tomi Klein, Reut Tsarfaty, Ido Dagan, Ayal Klein. 2484-2502 [doi]
- Form and Meaning in Intrinsic Multilingual EvaluationsWessel Poelman, Miryam de Lhoneux. 2503-2521 [doi]
- What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete KnowledgeDongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang 0001, Hongkuan Zhou, Yuan He 0008, Jiaoyan Chen 0001, Steffen Staab, Evgeny Kharlamov. 2522-2538 [doi]
- Assessing Web Search Credibility and Response Groundedness in Chat AssistantsIvan Vykopal, Matús Pikuliak, Simon Ostermann 0002, Marián Simko. 2539-2560 [doi]
- When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was TerrifiedGautam Siddharth Kashyap, Mark Dras, Usman Naseem. 2561-2572 [doi]
- NeuronMoE: Efficient Cross-Lingual Extension via Neuron-Guided Mixture-of-ExpertsRongzhi Li, Hitomi Yanaka. 2573-2586 [doi]
- From Emotion to Expression: Theoretical Foundations and Resources for Fear SpeechVigneshwaran Shankaran, Gabriella Lapesa, Claudia Wagner 0001. 2587-2606 [doi]
- AdaptBPE: From General Purpose to Specialized TokenizersVijini Liyanage, François Yvon. 2607-2620 [doi]
- Reassessing Active Learning Adoption in Contemporary NLP: A Community SurveyJulia Romberg, Christopher Schröder 0001, Julius Gonsior, Katrin Tomanek, Fredrik Olsson. 2621-2647 [doi]
- Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted FeedbackOsama Mohammed Afzal, Preslav Nakov, Tom Hope, Iryna Gurevych. 2648-2671 [doi]
- AfriVox: Probing Multilingual and Accent Robustness of Speech LLMsBusayo Awobade, Mardhiyah Sanni, Tassallah Abdullahi, Chibuzor Okocha, Kelechi Ezema, Devendra Deepak Kayande, Lukman E. Ismaila, Tobi Olatunji, Gloria Ashiya Katuka. 2672-2690 [doi]
- PortOldBERT: Portuguese Historical Language ModelsTomás Freitas Osório, Henrique Lopes Cardoso. 2691-2705 [doi]
- ReMedQA: Are We Done With Medical Multiple-Choice Benchmarks?Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Gianluca Moro. 2706-2738 [doi]
- Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language ModelsGabriele Maraia, Leonardo Ranaldi, Marco Valentino, Fabio Massimo Zanzotto. 2739-2753 [doi]
- SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent SpaceViktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro. 2754-2775 [doi]
- Knowledge Augmentation Enhances Token Classification for Recipe UnderstandingNuhu Ibrahim, Robert Stevens, Riza Batista-Navarro. 2776-2788 [doi]
- Argumentation and Judgement Factors: LLM-based Discovery and Application in Insurance DisputesBasit Ali, Anubhav Sinha, Nitin Ramrakhiyani, Sachin Pawar, Girish Keshav Palshikar, Manoj Apte. 2789-2804 [doi]
- ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese TextsQuan Hung Tran, Pham Tien Nam, Son T. Luu, Kiet Van Nguyen. 2805-2831 [doi]
- PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMsManuel Frank, Haithem Afli. 2832-2851 [doi]
- DETECT: Determining Ease and Textual Clarity of German Text SimplificationsMaria Korobeynikova, Alessia Battisti, Lukas Fischer 0003, Yingqiang Gao. 2852-2882 [doi]
- MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning SupportWei-Ling Hsu, Yu-Chien Tang, An-Zi Yen. 2883-2901 [doi]
- Test-Time Scaling of Reasoning Models for Machine TranslationZihao Li, Shaoxiong Ji, Jörg Tiedemann. 2902-2917 [doi]
- How Good Are LLMs at Processing Tool Outputs?Kiran Kate, Yara Rizk, Poulami Ghosh, Ashu Gulati, Tathagata Chakraborti, Zidane Wright, Mayank Agarwal. 2918-2941 [doi]
- Tug-of-war between idioms' figurative and literal interpretations in LLMsSoyoung Oh, Xinting Huang, Mathis Pink, Michael Hahn 0001, Vera Demberg. 2942-2958 [doi]
- Do LLM hallucination detectors suffer from low-resource effect?Debtanu Datta, Mohan Kishore Chilukuri, Yash Kumar, Saptarshi Ghosh 0001, Muhammad Bilal Zafar. 2959-2985 [doi]
- Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles LabelingAnas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Mary Catherine Lavissière, Christine Jacquin, Richard Dufour. 2986-3004 [doi]
- Guided by the Plan: Enhancing Faithful Autoregressive Text-to-Audio Generation with Guided DecodingJuncheng Wang, Zhe Hu, Chao Xu 0023, Siyue Ren, Yuxiang Feng, Yang Liu 0356, Baigui Sun, Shujun Wang. 3005-3018 [doi]
- Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation SpaceAndrea Ermellino, Lorenzo Malandri, Fabio Mercorio, Antonio Serino. 3019-3034 [doi]
- PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods BenchmarkRóbert Belanec, Branislav Pecher, Ivan Srba, Mária Bieliková. 3035-3054 [doi]
- Decoding the Market's Pulse: Context-Enriched Agentic Retrieval Augmented Generation for Predicting Post-Earnings Price ShocksChenhui Li, Weihai Lu. 3055-3073 [doi]
- LAILA: A Large Trait-Based Dataset for Arabic Automated Essay ScoringMay Bashendy, Walid Massoud, Sohaila Eltanbouly, Salam Albatarni, Marwan Sayed, Abrar Abir, Houda Bouamor, Tamer Elsayed. 3074-3091 [doi]
- Live API-Bench: 2500+ Live APIs for Testing Multi-Step Tool CallingBenjamin Elder, Anupama Murthi, Jungkoo Kang, Ankita Naik, Kinjal Basu 0002, Kiran Kate, Danish Contractor. 3092-3124 [doi]
- MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation DetectionArkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos, Elisa Sartori, Tiziano Labruna, Giovanni Da San Martino, Adam Wierzbicki. 3125-3148 [doi]
- When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model TrainingFelicia Körner, Max Müller-Eberstein, Anna Korhonen, Barbara Plank. 3149-3169 [doi]
- Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language ModelsQiao Liang, Yanjiang Liu, Weixiang Zhou, Ben He 0001, Yaojie Lu 0001, Hongyu Lin, Jia Zheng 0009, Xianpei Han, Le Sun 0001, Yingfei Sun. 3170-3184 [doi]
- Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation SystemsKin Kwan Leung, Mouloud Belbahri, Yi Sui 0001, Alex Labach, Xueying Zhang, Stephen Rose, Jesse C. Cresswell. 3185-3207 [doi]
- Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and ApplicationHaoyu Jiang, Fanjie Zeng, Boan Qu, Xiaojie Lin, Wei Zhong. 3208-3220 [doi]
- AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncodersGeorgii Aparin, Tasnima Sadekova, Alexey D. Rukhovich, Assel Yermekova, Laida Kushnareva, Vadim Popov, Kristian Kuznetsov, Irina Piontkovskaya. 3221-3254 [doi]
- Vision-Language Models Align with Human Neural Representations in Concept ProcessingAnna Bavaresco, Marianne de Heer Kloots, Sandro Pezzelle, Raquel Fernández. 3255-3274 [doi]
- FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive LearningMinh Ngoc Ta, Dong Cao Van, Duc Anh Hoang, Minh Le-Anh, Truong Nguyen, My Anh Tran Nguyen, Yuxia Wang 0003, Preslav Nakov, Dinh Viet Sang. 3275-3296 [doi]
- BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training DataJaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galván-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot 0001, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, Leshem Choshen. 3297-3329 [doi]
- Personality Editing for Language Models through Adjusting Self-Referential QueriesSeojin Hwang, Yumin Kim, Byeongjeong Kim, Donghoon Shin, Hwanhee Lee. 3330-3351 [doi]
- How Much Pretraining Does Structured Data Need?Daniel Fadlon, Kfir Bar. 3352-3365 [doi]
- Finding Culture-Sensitive Neurons in Vision-Language ModelsXiutian Zhao, Rochelle Choenni, Rohit Saxena, Ivan Titov 0001. 3366-3381 [doi]
- Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice QuestionsLéo Labat, Étienne Ollion, François Yvon. 3382-3398 [doi]
- ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained LinksSerwar Basch, Ilia Kuznetsov, Tom Hope, Iryna Gurevych. 3399-3423 [doi]
- Decision-Making with Deliberation: Meta-reviewing as a Document-grounded DialogueSukannya Purkayastha, Nils Dycke, Anne Lauscher, Iryna Gurevych. 3424-3465 [doi]
- HalluZig: Hallucination Detection using Zigzag PersistenceShreyas N. Samaga, Gilberto Gonzalez Arroyo, Tamal K. Dey. 3466-3482 [doi]
- Mapping the Course for Prompt-based Structured PredictionMatt Pauk, Maria Leonor Pacheco. 3483-3508 [doi]
- Breach in the Shield: Unveiling the Vulnerabilities of Large Language ModelsRunpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu. 3509-3521 [doi]
- Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM DecodingHuayu Li, Zhengxiao He, Siyuan Tian, Jinghao Wen, Ao Li. 3522-3533 [doi]
- Is This LLM Library Learning? Evaluation Must Account For Compute and BehaviourIan Berlot-Attwell, Tobias Sesterhenn, Frank Rudzicz, Xujie Si. 3534-3568 [doi]
- Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QAJongwoo Park 0003, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryu, Donghyun Kim 0006, Michael S. Ryoo. 3569-3588 [doi]
- A Unified View on Emotion Representation in Large Language ModelsAishwarya Maheswaran, Maunendra Sankar Desarkar. 3589-3610 [doi]
- TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language ModelsShima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi. 3611-3625 [doi]
- ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMsMohamed Elaraby, Diane J. Litman. 3626-3643 [doi]
- AudioJudge: Understanding What Works in Large Audio Model Based Speech EvaluationPotsawee Manakul, Woody Haosheng Gan, Michael J. Ryan, Ali Sartaz Khan, Warit Sirichotedumrong, Kunat Pipatanakul, William Barr Held, Diyi Yang. 3644-3663 [doi]
- Learning Multilingual Agentic Policy to Control SycophancyLeonardo Ranaldi, Giulia Pucci. 3664-3681 [doi]
- ToxiPrompt: A Two-Stage Red-Teaming Approach for Balancing Adversarial Prompt Diversity and Response ToxicitySeungho Lee, Kyumin Lee. 3682-3696 [doi]
- AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African LanguagesKosei Uemura, Miaoran Zhang, David Ifeoluwa Adelani. 3697-3717 [doi]
- Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept RecognitionShanshan Liu, Noriki Nishida, Fei Cheng 0002, Narumi Tokunaga, Rumana Ferdous Munne, Yuki Yamagata, Kouji Kozaki, Takehito Utsuro, Yuji Matsumoto 0001. 3718-3734 [doi]
- A Representation Sharpening Framework for Zero Shot Dense RetrievalDhananjay Ashok, Suraj Nair 0001, Mutasem Al-Darabsah, Choon Hui Teo, Tarun Agarwal, Jonathan May. 3735-3751 [doi]
- Spotlight Your Instructions: Instruction-following with Dynamic Attention SteeringPraveen Venkateswaran, Danish Contractor. 3752-3770 [doi]
- FormGym: Doing Paperwork with AgentsMatthew Toles, Isaac Song, Rattandeep Singh, Zhou Yu. 3771-3785 [doi]
- NarraBench: A Comprehensive Framework for Narrative BenchmarkingSil Hamilton, Matthew Wilkens, Andrew Piper. 3786-3801 [doi]
- FaithLM: Towards Faithful Explanations for Large Language ModelsYu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang 0002, Ruixiang Tang, Shaochen Zhong, Fan Yang 0023, Andrew Wen, Mengnan Du, Xuanting Cai, Vladimir Braverman, Xia Hu 0001. 3802-3824 [doi]
- Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?Matteo Gay, Coleman Haley, Mario Giulianelli, Edoardo M. Ponti. 3825-3853 [doi]
- KAD: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation DeferralAyoub Hammal, Pierre Zweigenbaum, Caio Corro. 3854-3872 [doi]
- When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM EvaluationAbeer Badawi, Elahe Rahimi, Md. Tahmid Rahman Laskar, Sheri Grach, Lindsay Bertrand, Lames Danok, Prathiba Dhanesh, Jimmy Huang 0001, Frank Rudzicz, Elham Dolatabadi. 3873-3896 [doi]
- DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout StructuresBenno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro. 3897-3907 [doi]
- IDEAlign: Comparing Ideas of Large Language Models to Domain ExpertsHyunJi Nam, Lucía Langlois, James Malamut, Mei Tan, Dorottya Demszky. 3908-3925 [doi]
- Amory: Building Coherent Narrative-Driven Agent Memory through Agentic ReasoningYue Zhou, Xiaobo Guo, Belhassen Bayar, Srinivasan H. Sengamedu. 3926-3938 [doi]
- It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language ModelsCristian Santini, Marieke van Erp, Mehwish Alam. 3939-3954 [doi]
- SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image GenerationCarolin Holtermann, Florian Schneider 0001, Anne Lauscher. 3955-3995 [doi]
- Gender and Politeness Perception: A Novel Approach for Exploring Annotations DisagreementAhmad Aljanaideh. 3996-4005 [doi]
- TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image ModelsCarolin Holtermann, Nina Krebs, Anne Lauscher. 4006-4028 [doi]
- ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial GenerationPeiran Li, Jan Fillies, Adrian Paschke. 4029-4044 [doi]
- Text Classification Under Class Distribution Shift: A SurveyAdriana Valentina Costache, Silviu Florin Gheorghe, Eduard Gabriel Poesina, Paul Irofti, Radu-Tudor Ionescu. 4045-4060 [doi]
- Reasoning's Razor: Reasoning Improves Accuracy but Hurts Recall at Critical Operating Points in Safety and Hallucination DetectionAtoosa Malemir Chegini, Hamid Kazemi, Garrett Souza, Maria Safi, Yang Song 0009, Samy Bengio, Sinead Williamson, Mehrdad Farajtabar. 4061-4086 [doi]
- Instructional Agents: Reducing Teaching Faculty Workload through Multi-Agent Instructional DesignHuaiyuan Yao, Wanpeng Xu, Justin Turnau, Nadia Kellam, Hua Wei 0001. 4087-4109 [doi]
- Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation ModelingWeishi Wang, Hengchang Hu, Daniel Dahlmeier. 4110-4130 [doi]
- Validating Automatic Evaluation of Controllable Counterspeech Generation: Rankings Matter More Than ScoresYi Zheng, Björn Ross, Walid Magdy. 4131-4146 [doi]
- Journey Before Destination: On the importance of Visual Faithfulness in Slow ThinkingRheeya Uppaal, Phu Mon Htut, Min Bai, Nikolaos Pappas 0004, Zheng Qi, Sandesh Swamy. 4147-4168 [doi]
- Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific ToolsHa Min Son, Huan Ren, Xin Liu, Zhe Zhao. 4169-4189 [doi]
- MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning ExperimentsRoelien C. Timmer, Necva Bölücü, Stephen Wan 0001. 4190-4206 [doi]
- Enhancing the Safety of Medical Vision-Language Models by Synthetic DemonstrationsZhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani. 4207-4220 [doi]
- HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech ExplanationsYujia Hu, Roy Ka-Wei Lee. 4221-4240 [doi]
- Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?Zhengyang Shan, Aaron Mueller. 4241-4265 [doi]
- A Survey on LLM-based Conversational User SimulationBo Ni, Yu Wang 0160, Leyao Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Luera, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen K. Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang 0002, Tong Yu 0001, SungChul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa F. Siu, Zichao Wang, Seunghyun Yoon 0002, Nedim Lipka, Namyong Park 0001, Zihao Lin, Trung Bui, Yue Zhao 0016, Tyler Derr, Ryan A. Rossi. 4266-4301 [doi]
- Prompt-driven Detection of Offensive Urdu Language using Large Language ModelsIffat Maab, Usman Haider, Junichi Yamagishi. 4302-4327 [doi]
- Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language ModelsTiejin Chen, Kaishen Wang, Hua Wei 0001. 4328-4344 [doi]
- RAGPPI: Retrieval-Augmented Generation Benchmark for Protein-Protein Interactions in Drug DiscoveryYoungseung Jeon, Ziwen Li 0001, Thomas Li, JiaSyuan Chang, Morteza Ziyadi, Xiang Anthony Chen. 4345-4363 [doi]
- Don't Generate, Classify! Low-Latency Prompt Optimization with Structured Complementary PromptHee-Soo Kim, Jun-Young Kim, Jeong Hwan Lee, Seong-Jin Park, Kang Min Kim. 4364-4383 [doi]
- CHROMIC: Chronological Reasoning Across Multi-Panel ComicsBingxuan Hou, Jiayi Lin 0003, Chenyang Zhang 0004, Dapeng Yin, Shuyue Zhu, Qingqing Hong, Mengna Gao, Jun-li Wang 0001. 4384-4400 [doi]
- GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer SelectionKai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao. 4401-4416 [doi]
- BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language ModelsChuyuan Li, Giuseppe Carenini. 4417-4479 [doi]
- Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient ReasoningChuang Zhang, Zizhen Zhu, Yihao Wei, Bing Tian, Junyi Liu, Henan Wang, Wang Xavier, Yaxiao Liu. 4480-4501 [doi]
- Chat-Ghosting: Methods for Auto-Completion in Dialog SystemsAnubhab Mandal, Sandeep Mishra, Bishal Santra, Tushar Abhishek, Pawan Goyal 0002, Manish Gupta 0001. 4502-4528 [doi]
- Attribution-Guided Multi-Object Hallucination and Bias Detection in Vision-Language ModelsSirat Samyoun, Yingtai Xiao, Jian Du. 4529-4548 [doi]
- Word Surprisal Correlates with Sentential Contradiction in LLMsNing Shi, Bradley Hauer, David Basil, John Zhang, Grzegorz Kondrak. 4549-4564 [doi]
- ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language ModelsSharanya Dasgupta, Arkaprabha Basu, Sujoy Nath, Swagatam Das. 4565-4584 [doi]
- Re2-DocRED: Revisiting Revisited-DocRED for Joint Entity and Relation ExtractionChen Kim Heng, Shao Wen Tong, Julian Wong Wei Sheng. 4585-4621 [doi]
- Where Do LLMs Compose Meaning? A Layerwise Analysis of Compositional RobustnessNura Aljaafari, Danilo S. Carvalho, André Freitas. 4622-4646 [doi]
- BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language ModelsBryan Chen Zhengyu Tan, Weihua Zheng, Zhengyuan Liu, Nancy F. Chen, Hwaran Lee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee. 4647-4669 [doi]
- Document-Level Zero-Shot Relation Extraction with Entity Side InformationMohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam. 4670-4680 [doi]
- Steering Large Language Models for Machine Translation PersonalizationDaniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim. 4681-4701 [doi]
- Taxation Perspectives from Large Language Models: A Case Study on Additional Tax PenaltiesEunkyung Choi, Young Jin Suh, Siun Lee, Hongseok Oh, Juheon Kang, Won Hur, Hun Park, Wonseok Hwang. 4702-4726 [doi]
- Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge EditingShigeng Chen, Linhao Luo, Zhangchi Qiu, Yanan Cao 0001, Carl Yang 0001, Shirui Pan. 4727-4751 [doi]
- Unlocking Latent Discourse Translation in LLMs Through Quality-Aware DecodingWafaa Mohammed, Vlad Niculae, Chrysoula Zerva. 4752-4774 [doi]
- Cross-lingual and Word-Independent Methods for Quantifying Degree of GrammaticalizationRyo Nagata, Daichi Mochihashi, Misato Ido, Yusuke Kubota, Naoki Otani, Yoshifumi Kawasaki, Hiroya Takamura. 4775-4787 [doi]
- Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare EntitiesHans Hergen Lehmann, Jae Hee Lee 0001, Steven Schockaert, Stefan Wermter. 4788-4821 [doi]
- Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLMsEverlyn Asiko Chimoto, Mostafa Elhoushi, Bruce A. Bassett. 4822-4838 [doi]
- LaCoMSA: Language-Consistency Multilingual Self-Alignment with Latent Representation RewardingKhanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen. 4839-4853 [doi]
- Can you map it to English? The Role of Cross-Lingual Alignment in the Multilingual Performance of LLMsKartik Ravisankar, HyoJung Han, Sarah Wiegreffe, Marine Carpuat. 4854-4872 [doi]
- Recursive numeral systems are highly regular and easy to processPonrawee Prasertsom, Andrea Silvi, Jennifer Culbertson, Devdatt P. Dubhashi, Moa Johansson 0001, Kenny Smith. 4873-4885 [doi]
- Bringing Emerging Architectures to Sequence Labeling in NLPAna Ezquerro, Carlos Gómez-Rodríguez, David Vilares 0001. 4886-4909 [doi]
- SEMIROUTER: Sparse-Data Enhanced Routing for Adaptive Multi-LLM SystemZijie Wang, Xinyu Yan, Che Wang, Zihao Zeng, Lei Xiao, Wei Yang Bryan Lim. 4910-4921 [doi]
- DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge DistillationHyeseon An, Shinwoo Park, Suyeon Woo, Yo-Sub Han. 4922-4936 [doi]
- Boundary-Aware LLM Augmentation for Low-Resource Event Argument ExtractionZhaoyue Sun, Gabriele Pergola, Yulan He 0001. 4937-4953 [doi]
- CASE - Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity MeasurementGaifan Zhang, Yi Zhou 0019, Danushka Bollegala. 4954-4968 [doi]
- Evaluation and LLM-Guided Learning of ICD Coding RationalesMingyang Li, Viktor Schlegel, Tingting Mu, Wuraola Oyewusi, Kai Kang, Goran Nenadic. 4969-5003 [doi]
- Evaluating the Effect of Retrieval Augmentation on Social BiasesTianhui Zhang, Yi Zhou 0019, Danushka Bollegala. 5004-5026 [doi]
- Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM InteractionsAngana Borah, Rada Mihalcea, Verónica Pérez-Rosas. 5027-5053 [doi]
- Entropy-Gated Branching for Efficient Test-Time ReasoningXianzhi Li, Ethan Callanan, Abdellah Ghassel, Xiaodan Zhu 0001. 5054-5069 [doi]
- Decomposition-Enhanced Training for Post-Hoc Attributions in Language ModelsSriram Balasubramanian, Samyadeep Basu, Koustava Goswami, Ryan Anthony Rossi, Varun Manjunatha, Roshan Santhosh, Ruiyi Zhang 0002, Soheil Feizi, Nedim Lipka. 5070-5084 [doi]
- INSURE-Dial: A Phase-Aware Conversational Dataset Benchmark for Compliance Verification and Phase DetectionShubham Kulkarni, Alexander Lyzhov, Preetam Joshi, Shiva Chaitanya. 5085-5109 [doi]
- NLP for Social Good: A Survey and Outlook of Challenges, Opportunities and Responsible DeploymentAntonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Fatima Zahra Moudakir, Deniz Nazar, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Keenan Samway, Dominik Stammbach, Anna Steinberg, David Tomás 0001, Steven R. Wilson 0001, Bowen Yi 0001, Jessica H. Zhu, Arkaitz Zubiaga, Anders Søgaard, Alexander Fraser 0001, Zhijing Jin 0001, Rada Mihalcea, Joel R. Tetreault, Daryna Dementieva. 5110-5170 [doi]
- From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMsSuyash Fulay, Jocelyn Zhu, Michiel A. Bakker. 5171-5194 [doi]
- Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim DetectionIvan Vykopal, Antonia Karamolegkou, Jaroslav Kopcan, Qiwei Peng 0003, Tomas Javurek, Michal Gregor, Marián Simko. 5195-5221 [doi]
- FFE-Hallu: Hallucinations in Fixed Figurative Expressions: A Benchmark of Idioms and Proverbs in the Persian LanguageFaezeh Hosseini, Mohammadali Yousefzadeh, Yadollah Yaghoobzadeh. 5222-5235 [doi]
- MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence RetrievalDelvin Ce Zhang, Suhan Cui, Zhelin Chu, Xianren Zhang, Dongwon Lee 0001. 5236-5255 [doi]
- DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal UnderstandingShubham Patle, Sara Ghaboura, Hania Tariq, Mohammad Usman Khan, Omkar Thawakar, Rao Muhammad Anwer, Salman H. Khan 0001. 5256-5269 [doi]
- ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational RecommendersOfer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, Craig Boutilier. 5270-5304 [doi]
- Detecting Latin in Historical Books with Large Language Models: A Multimodal BenchmarkYu Wu, Ke Shu, Jonas Fischer, Lidia Pivovarova, David Rosson, Eetu Mäkelä, Mikko Tolonen. 5305-5328 [doi]
- Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended InteractionsPedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schütze, Benjamin Roth 0001. 5329-5359 [doi]
- CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language ModelsPaul Grundmann, Jan Frick, Dennis Fast, Thomas Steffek, Felix A. Gers, Wolfgang Nejdl, Alexander Löser. 5360-5378 [doi]
- DIVINE : Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder AssessmentMohd Mujtaba Akhtar, Girish, Muskaan Singh. 5379-5392 [doi]
- Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the ImpossibleImry Ziv, Nur Geffen Lan, Emmanuel Chemla. 5393-5403 [doi]
- Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic SpeechMohd Mujtaba Akhtar, Girish, Farhan Sheth, Muskaan Singh. 5404-5413 [doi]
- Detecting Non-Membership in LLM Training Data via Rank CorrelationsPranav Shetty, Mirazul Haque, Zhiqiang Ma, Xiaomo Liu. 5414-5429 [doi]
- Taming Object Hallucinations with Verified Atomic Confidence EstimationJiarui Liu 0004, Weihao Xuan, Zhijing Jin 0001, Mona T. Diab. 5430-5444 [doi]
- DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal ReasoningNithin Sivakumaran, Justin Chih-Yao Chen, David Wan, Yue Zhang, Jaehong Yoon, Elias Stengel-Eskin, Mohit Bansal. 5445-5464 [doi]
- ToolDreamer: Instilling LLM Reasoning Into Tool RetrieversSaptarshi Sengupta, Zhengyu Zhou, Jun Araki, Xingbo Wang, Bingqing Wang, Suhang Wang, Zhe Feng 0003. 5465-5482 [doi]
- An Empirical Study of Speculative Decoding for Small Language ModelsLuca Mainardi, Selçuk Sandikci, Joaquin Vanschoren. 5483-5497 [doi]
- Lost in Formatting: How Output Formats Skew LLM Performance on Information ExtractionRishi Ravikumar, Nuhu Ibrahim, Riza Batista-Navarro. 5498-5513 [doi]
- Pseudo-Likelihood Training for Reasoning Diffusion Language ModelsShiv Shankar. 5514-5529 [doi]
- RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test SetsJán Cegin, Branislav Pecher, Ivan Srba, Jakub Simko. 5530-5545 [doi]
- RotBench: Evaluating Multi-modal Large Language Models on Identifying Image RotationTianyi Niu, Jaemin Cho 0001, Elias Stengel-Eskin, Mohit Bansal. 5546-5569 [doi]
- Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMsAlireza Dehghanpour Farashah, Aditi Khandelwal, Marylou Fauchard, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi. 5570-5589 [doi]
- Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMsYuxuan Jiang, Francis Ferraro. 5590-5607 [doi]
- Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern AnalysisDisha Makhija, Manoj Ghuhan Arivazhagan, Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah. 5608-5620 [doi]
- Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language DataPaul Quinlan, Qingguo Li, Xiaodan Zhu 0001. 5621-5647 [doi]
- Beyond Names: How Grammatical Gender Markers Bias LLM-based Educational RecommendationsLuca Benedetto, Antonia Donvito, Alberto Lucchetti, Andrea Cappelli, Paula Buttery. 5648-5668 [doi]
- ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document ImagesMathieu Sibue, Andrés Muñoz Garza, Samuel Mensah, Pranav Shetty, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso. 5669-5688 [doi]
- What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order ReasoningZhaotian Weng, Haoxuan Li, Xin Eric Wang, Kuan-Hao Huang, Jieyu Zhao. 5689-5701 [doi]
- KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMsMingrui Ye, Chanjin Zheng, Zengyi Yu, Chenyu Xiang, Zhixue Zhao, Zheng Yuan, Helen Yannakoudakis. 5702-5722 [doi]
- Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time InterventionsNavita Goyal, Hal Daumé III. 5723-5738 [doi]
- Tracing Multilingual Knowledge Acquisition Dynamics in Domain Adaptation: A Case Study of Biomedical AdaptationXin Zhao, Naoki Yoshinaga 0001, Tsuta Yuma, Akiko Aizawa. 5739-5760 [doi]
- Contextual morphologically-guided tokenization for Latin encoder modelsMarisa Hudspeth, Patrick J. Burns 0001, Brendan T. O'Connor 0001. 5761-5775 [doi]
- Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum LearningYang Zhang, Amr Mohamed 0005, Hadi Abdine, Guokan Shang, Michalis Vazirgiannis. 5776-5794 [doi]
- ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR EnvironmentsShiyi Ding, Shaoen Wu, Ying Chen. 5795-5812 [doi]
- Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting KnowledgeYiyang Feng, Zeming Chen 0001, Haotian Wu, Jiawei Zhou, Antoine Bosselut. 5813-5847 [doi]
- Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues RelianceJingyi Chen, Zhimeng Guo, Jiyun Chun, Pichao Wang, Andrew Perrault, Micha Elsner. 5848-5877 [doi]
- CSPB: Conversational Speech Processing Benchmark for Self-supervised Speech ModelsZili Huang, Matthew Maciejewski, Leibny Paola García-Perera, Shinji Watanabe 0001, Sanjeev Khudanpur. 5878-5893 [doi]
- Multi-Token Completion for Text AnonymizationPulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, Anjalie Field. 5894-5908 [doi]
- MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder-LLM Integration in Cross-Lingual ReasoningKosei Uemura, David Guzmán, Quang-Phuoc Nguyen, Jesujoba Oluwadara Alabi, En-Shiun Annie Lee, David Ifeoluwa Adelani. 5909-5924 [doi]
- Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language ModelsYe Yu, Haibo Jin, Yaoning Yu, Jun Zhuang 0004, Haohan Wang. 5925-5939 [doi]
- Evaluating Adversarial Robustness of Concept Representations in Sparse AutoencodersAaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju. 5940-5957 [doi]
- Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?Karin de Langis, Püren Öncel, Ryan Peters, Andrew Elfenbein, Laura Kristen Allen, Andreas Schramm, Dongyeop Kang. 5958-5970 [doi]
- Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMsKarin de Langis, Jong Inn Park, Khanh Chi Le, Andreas Schramm, Andrew Elfenbein, Michael C. Mensink, Dongyeop Kang. 5971-5986 [doi]
- How Do Language Models Acquire Character-Level Information?Soma Sato, Ryohei Sasano. 5987-5997 [doi]
- Analysing the role of lexical and temporal information in turn-taking through predictabilitySean Leishman, Sarenne Wallbridge, Peter Bell 0001. 5998-6009 [doi]
- Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child UtterancesJiyun Chun, Eric Fosler-Lussier, Michael White 0001, Andrew Perrault. 6010-6030 [doi]
- Translation via Annotation: A Computational Study of Translating Classical Chinese into JapaneseZilong Li, Jie Cao. 6031-6045 [doi]
- Extending Audio Context for Long-Form Understanding in Large Audio-Language ModelsYuatyong Chaichana, Pittawat Taveekitworachai, Warit Sirichotedumrong, Potsawee Manakul, Kunat Pipatanakul. 6046-6066 [doi]
- HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single TokenSai Akhil Kogilathota, Sripadha Vallabha E. G, Luzhe Sun, Jiawei Zhou. 6067-6085 [doi]
- Nanda Family: Open-Weights Generative Large Language Models for HindiAaryamonvikram Singh, Debopriyo Banerjee, Dhruv Sahnan, Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Xudong Han, Haonan Li 0002, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Kumar Sahu, Parvez Mullah, Ali El Filali, Zainul Abedien Ahmed Quraishi, Neha Sengupta, Gokul Ramakrishnan, Rituraj Joshi, Gurpreet Gosal, Avraham Sheinin, Natalia Vassilieva, Preslav Nakov. 6086-6108 [doi]
- Wugnectives: Novel Entity Inferences of Language Models from Discourse ConnectivesDaniel Brubaker, William Sheffield, Junyi Jessy Li, Kanishka Misra. 6109-6127 [doi]
- Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval over haystacksAmey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty 0002. 6128-6152 [doi]
- Knowing When to Abstain: Medical LLMs Under Clinical UncertaintySravanthi Machcha, Sushrita Yerra, Sahil Gupta, Aishwarya Sahoo, Sharmin Sultana, Hong Yu 0001, Zonghai Yao. 6153-6182 [doi]
- MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical SkillsZonghai Yao, Zihao Zhang 0001, Chaolong Tang, Xingyu Bian, Youxia Zhao, Zhichao Yang 0001, Junda Wang, Huixue Zhou, Won-Seok Jang, Feiyun Ouyang, Hong Yu 0001. 6183-6257 [doi]
- Continual-learning for Modelling Low-Resource Languages from Large Language ModelsSantosh Srinath K, Mudit Somani, Varun Reddy Padala, Prajna Upadhyay, Abhijit Das. 6258-6275 [doi]
- Language-Grounded Multi-Domain Image Translation via Semantic Difference GuidanceJongwon Ryu, Joonhyung Park, Jaeho Han, Yeong-Seok Kim, Hye-Rin Kim, Sunjae Yoon, Junyeong Kim. 6276-6288 [doi]
- LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph ExtractionJunior Cedric Tonga, Chen Cecilia Liu, Iryna Gurevych, Fajri Koto. 6289-6309 [doi]
- Nahw: A Comprehensive Benchmark of Arabic Grammar Understanding, Error Detection, Correction, and ExplanationHamdy Mubarak, Majd Hawasly, Abubakr Mohamed. 6310-6328 [doi]
- Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity EvaluationsLi-Chun Lu, Miri Liu, Pin-Chun Lu, Yufei Tian, Shao-Hua Sun, Nanyun Peng 0001. 6329-6352 [doi]
- TReX: Tokenizer Regression for Optimal Data MixtureInho Won, Hangyeol Yoo, Minkyung Cho, Jungyeul Park, Hoyun Song, Kyungtae Lim. 6353-6370 [doi]
- CONGRAD: Conflicting Gradient Filtering for Multilingual Preference AlignmentJiangnan Li, Thuy-Trang Vu, Christian Herold, Amirhossein Tebbifakhr, Shahram Khadivi, Gholamreza Haffari. 6371-6387 [doi]
- Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMsPranav Bhandari, Nicolas Fay, Sanjeevan Selvaganapathy, Amitava Datta, Usman Naseem, Mehwish Nasim. 6388-6403 [doi]
- Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random WalksSergey Pankratov, Dan Alistarh. 6404-6418 [doi]
- KG-CRAFT: Knowledge Graph-based Contrastive Reasoning with LLMs for Enhancing Automated Fact-checkingVítor Lourenço, Aline Paes, Tillman Weyde, Audrey Depeige, Mohnish Dubey. 6419-6439 [doi]
- SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific LiteratureHang Ding, Yilun Zhao 0001, Tiansheng Hu, Manasi Patwardhan 0001, Arman Cohan. 6440-6460 [doi]
- Unintended Memorization of Sensitive Information in Fine-Tuned Language ModelsMarton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert. 6461-6480 [doi]
- The Pluralistic Moral Gap: Understanding Moral Judgment and Value Differences between Humans and Large Language ModelsGiuseppe Russo 0001, Debora Nozza, Paul Röttger, Dirk Hovy. 6481-6497 [doi]
- CoReTab: Improving Multimodal Table Understanding with Code-driven ReasoningVan Quang Nguyen, Takayuki Okatani. 6498-6523 [doi]
- Explaining Generalization of AI-Generated Text Detectors Through Linguistic AnalysisYuxi Xia, Kinga Stanczak, Benjamin Roth 0001. 6524-6546 [doi]
- Elections go bananas: A First Large-scale Multilingual Study of Pluralia Tantum using LLMsElena Spaziani, Kamyar Zeinalipour, Pierluigi Cassotti, Nina Tahmasebi. 6547-6570 [doi]
- CacheNotes: Task-Aware Key-Value Cache Compression for Reasoning-Intensive Knowledge TasksGiulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti. 6571-6590 [doi]
- Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect GuidanceYao Fu, Ran Qiu, Xinhe Wang 0001, Jacob Sansom, Sathvika Ayyappa Prabhu, Huijie Tang, Jaekyeom Kim, Sungryull Sohn, Honglak Lee. 6591-6618 [doi]
- How Do LLMs Generate Contrastive Sentiments? A Mechanistic PerspectiveVan Bach Nguyen, Jörg Schlötterer, Christin Seifert. 6619-6635 [doi]
- Continual Neural Topic ModelCharu Karakkaparambil James, Waleed Mustafa, Marcio Monteiro, Marius Kloft, Sophie Fellenz. 6636-6658 [doi]
- MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response TheoryVasudha Varadarajan, Hui Xu, Rebecca Astrid Boehme, Mariam Marlen Mirström, Sverker Sikström, H. Andrew Schwartz. 6659-6677 [doi]
- Principled Self-Correction in Discrete Diffusion: A UCB-Guided Framework for Text GenerationMasaki Asada, Makoto Miwa. 6678-6692 [doi]
- ConLID: Supervised Contrastive Learning for Low-Resource Language IdentificationNegar Foroutan, Jakhongir Saydaliev, Grace Kim, Antoine Bosselut. 6693-6708 [doi]
- Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy DetectionYanran Chen, Lynn Greschner, Roman Klinger, Michael Klenk, Steffen Eger. 6709-6732 [doi]
- Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-VisualizationMizanur Rahman, Mohammed Saidul Islam, Md. Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque. 6733-6750 [doi]
- Offline Preference Optimization via Maximum Marginal Likelihood EstimationSaeed Najafi, Alona Fyshe. 6751-6764 [doi]
- The Relevance of Value Systems for Offensive Language DetectionMichael Wiegand, Elisabeth Eder, Josef Ruppenhofer. 6765-6789 [doi]
- Instruction Tuning with and without Context: Behavioral Shifts and Downstream ImpactHyunji Lee, Seunghyun Yoon 0002, YunJae Won, Hanseok Oh, Geewook Kim, Trung Bui, Franck Dernoncourt, Elias Stengel-Eskin, Mohit Bansal, Minjoon Seo. 6790-6810 [doi]
- RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language ModelsAashiq Muhamed, Leonardo F. R. Ribeiro, Markus Dreyer, Virginia Smith, Mona T. Diab. 6811-6856 [doi]
- Query Decomposition for RAG: Balancing Exploration-ExploitationRoxana Petcu, Kenton Murray, Daniel Khashabi, Evangelos Kanoulas, Maarten de Rijke, Dawn J. Lawrie, Kevin Duh. 6857-6871 [doi]
- Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMsChi Zhang, Wenxuan Ding 0004, Jiale Liu, Mingrui Wu, Qingyun Wu, Ray Mooney. 6872-6895 [doi]
- Sycophancy Hides Linearly in the Attention HeadsRifo Ahmad Genadi, Munachiso Nwadike, Nurdaulet Mukhituly, Tatsuya Hiraoka, Hilal AlQuabeh, Kentaro Inui. 6896-6912 [doi]
- AICD Bench: A Challenging Benchmark for AI-Generated Code DetectionDaniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang 0003, Iryna Gurevych, Preslav Nakov. 6913-6938 [doi]
- Safeguarding Language Models via Self-Destruct TrapdoorShahar Katz, Bar Alon, Ariel Shaulov, Lior Wolf, Mahmood Sharif. 6939-6958 [doi]
- Rethinking Hallucinations: Correctness, Consistency, and Prompt MultiplicityPrakhar Ganesh, Reza Shokri, Golnoosh Farnadi. 6959-6978 [doi]
- Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical ResearchBojan Batalo, Erica K. Shimomoto, Dipesh Satav, Neil Millar. 6979-6992 [doi]
- H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMsSelim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Zachary Yahn, Ling Liu 0001. 6993-7013 [doi]
- Revisiting Generalization Across Difficulty Levels: It's Not So EasyYeganeh Kordi, Nihal V. Nayak, Max Zuo, Ilana Nguyen, Stephen H. Bach. 7014-7042 [doi]
- BLUR: A Bi-Level Optimization Approach for LLM UnlearningHadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang 0001, Volkan Cevher, Sijia Liu 0001, Mingyi Hong 0001. 7043-7058 [doi]
- DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal UnderstandingMoulik Choraria, Xinbo Wu, Akhil Bhimaraju, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal, Lav R. Varshney. 7059-7079 [doi]
- Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryMirac Suzgun, Mert Yüksekgönül, Federico Bianchi 0001, Dan Jurafsky, James Zou 0001. 7080-7106 [doi]
- Evidential Semantic Entropy for LLM Uncertainty QuantificationLucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda, Masahiro Hamasaki. 7107-7122 [doi]
- SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use CasesLaya Iyer, Angelina Wang, Sanmi Koyejo. 7123-7137 [doi]
- Incentivizing Strong Reasoning from Weak SupervisionYige Yuan, Teng Xiao, Shuchang Tao, Xue Wang 0010, Jinyang Gao, Bolin Ding, Bingbing Xu 0001. 7138-7156 [doi]
- DivMerge: A divergence-based model merging method for multi-taskingBrahim Touayouch, Loïc Fosse, Géraldine Damnati, Gwénolé Lecorvé. 7157-7180 [doi]
- A Reinforcement Learning Framework for Robust and Secure LLM WatermarkingLi An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang 0001, Shiyu Chang. 7181-7198 [doi]
- Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI AgentsSameer Komoravolu, Khalil Mrini. 7199-7214 [doi]
- User-Centric Evidence Ranking for Attribution and Fact VerificationGuy Alt, Eran Hirsch, Serwar Basch, Ido Dagan, Oren Glickman. 7215-7237 [doi]
- Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative LanguageMena Attia, Aashiq Muhamed, Mai Alkhamissi, Thamar Solorio, Mona T. Diab. 7238-7265 [doi]
- VietMix: A Naturally-Occurring Parallel Corpus and Augmentation Framework for Vietnamese-English Code-Mixed Machine TranslationHieu Tran, Phuong-Anh Nguyen-Le, Huy Nghiem, Quang-Nhan Nguyen, Wei Ai, Marine Carpuat. 7266-7284 [doi]
- Do You See Me : A Multidimensional Benchmark for Evaluating Visual Perception in Multimodal LLMsAditya Sanjiv Kanade, Tanuja Ganu. 7285-7326 [doi]
- An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model AgentsFarnoosh Hashemi, Michael Macy. 7327-7351 [doi]
- Detecting Subtle Biases: An Ethical Lens on Underexplored Areas in AI Language Models BiasesShayan Bali, Farhan Farsi, Mohammad Hosseini, Adel Khorramrouz, Ehsaneddin Asgari. 7352-7379 [doi]
- HarfoSokhan: A Comprehensive Parallel Dataset for Transitions between Persian Colloquial and Formal VariationsHamid Jahad Sarvestani, Vida Ramezanian, Saee Saadat, Neda Taghizadeh Serajeh, Maryam Sadat Razavi Taheri, Shohreh Kasaei, MohammadAmin Fazli, Ehsaneddin Asgari. 7380-7392 [doi]
- Compressing Language Models for Specialized DomainsMiles Williams, George Chrysostomou, Vitor Jeronymo, Nikolaos Aletras. 7393-7415 [doi]
- GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic PreferencesPriyanka Dey, Daniele Rosa, Wenqing Zheng, Daniel Barcklow, Jieyu Zhao 0001, Emilio Ferrara. 7416-7436 [doi]
- Multimodal Conversation Structure UnderstandingKent K. Chang, Mackenzie Cramer, Anna Ho, Ti Ti Nguyen, Yilin Yuan, David Bamman. 7437-7458 [doi]
- A Review of Incorporating Psychological Theories in LLMsZizhou Liu, Ziwei Gong, Lin Ai, Zheng Hui, Run Chen, Colin Wayne Leach, Michelle R. Greene, Julia Hirschberg. 7459-7495 [doi]
- How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing CapabilitiesAly M. Kassem, Bernhard Schölkopf, Zhijing Jin 0001. 7496-7507 [doi]
- NG-Router: Graph-Supervised Multi-Agent Collaboration for Nutrition Question AnsweringKaiwen Shi, Zheyuan Zhang, Zhengqing Yuan, Keerthiram Murugesan, Vincent Galassi, Chuxu Zhang, Yanfang Ye 0001. 7508-7527 [doi]
- Verification-Aware Planning for Multi-Agent SystemsTianyang Xu, Dan Zhang 0025, Kushan Mitra, Estevam Hruschka. 7528-7546 [doi]
- Zero-Shot Open-Schema Entity Structure DiscoveryXueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang 0001, Maxwell J. Giammona, Geeth de Mel, Jiawei Han 0001. 7547-7561 [doi]
- Beyond Semantics: How Temporal Biases Shapes Retrieval in Transformer and State-Space ModelsAnooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj. 7562-7581 [doi]
- Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision DeficienciesKazuki Hayashi, Shintaro Ozaki, Yusuke Sakai 0010, Hidetaka Kamigaito, Taro Watanabe. 7582-7605 [doi]
- Tokenizer-Aware Cross-Lingual Adaptation of Decoder-Only LLMs through Embedding Relearning and SwappingFan Jiang 0014, Honglin Yu, Grace Chung, Trevor Cohn. 7606-7636 [doi]
- Active Generalized Category Discovery with Diverse LLM FeedbackHenry Peng Zou, Siffi Singh, Yi Nian, Jianfeng He, Jason Cai, Saab Mansour, Hang Su. 7637-7658 [doi]
- RAFFLES: Reasoning-based Attribution of Faults for LLM SystemsChenyang Zhu 0011, Spencer Hong, Jingyu Wu, Kushal Chawla, Yuhui Tang, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu. 7659-7688 [doi]
- Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMsJames Beetham, Souradip Chakraborty, Mengdi Wang 0001, Furong Huang, Amrit Singh Bedi, Mubarak Shah. 7689-7713 [doi]
- Over-Searching in Retrieval-Augmented Large Language ModelsRoy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra. 7714-7739 [doi]
- LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative WritingDaniel Fein, Sebastian Russo, Violet Xiang, Kabir Jolly, Rafael Rafailov, Nick Haber. 7740-7755 [doi]
- H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversational AgentsZihe Ye, Jingyuan Huang, Weixin Chen, Yongfeng Zhang. 7756-7775 [doi]
- "Yuki Gets Sushi, David Gets Steak?": Uncovering Gender and Racial Biases in LLM-Based Meal RecommendationsXuefeng Wei, Xuan Zhou, Yusuke Sakai 0010, Taro Watanabe. 7776-7796 [doi]
- Happiness is Sharing a Vocabulary: A Study of Transliteration MethodsHaeji Jung, Jinju Kim, Kyungjin Kim, Youjeong Roh, David R. Mortensen. 7797-7816 [doi]
- SCALAR: Scientific Citation-based Live Assessment of Long-context Academic ReasoningRenxi Wang, Honglin Mu, Liqun Ma, Lizhi Lin, Yunlong Feng, Timothy Baldwin, Xudong Han, Haonan Li 0002. 7817-7830 [doi]
- Look Before You Leap: A Lookahead Reasoning Quality Gate for Speculative DecodingHiroaki Kingetsu, Kaoru Yokoo, Kenji Fukumizu, Manohar Kaul. 7831-7847 [doi]
- FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language ModelsMasoomali Fatehkia, Enes Altinisik, Husrev Taha Sencar. 7848-7869 [doi]
- BILLY: Steering Large Language Models via Merging Persona Vectors for Creative GenerationTsung-Min Pai, Jui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-yi Lee, Kai-Wei Chang 0001. 7870-7915 [doi]
- Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative StoryVladislav Pedashenko, Laida Kushnareva, Yana Khassan Nibal, Eduard Tulchinskii, Kristian Kuznetsov, Vladislav Zharchinskii, Yury Maximov, Irina Piontkovskaya. 7916-7944 [doi]
- Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language ModelsZongyu Wu 0001, Minhua Lin, Zhiwei Zhang 0028, Fali Wang, Xianren Zhang, Xiang Zhang 0001, Suhang Wang. 7945-7957 [doi]
- Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language ModelsChengzhi Zhong, Fei Cheng 0002, Qianying Liu, Yugo Murawaki, Chenhui Chu, Sadao Kurohashi. 7958-7970 [doi]
- Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language ModelsAtharvan Dogra, Soumya Suvra Ghosal, Ameet Deshpande, Ashwin Kalyan, Dinesh Manocha. 7971-7990 [doi]
- Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in LLMsYujia Zheng, Tianhao Li, Haotian Huang, Tianyu Zeng, Jingyu Lu, Chuangxin Chu, Yuekai Huang, Ziyou Jiang, Qian Xiong, Yuyao Ge, Mingyang Li. 7991-8019 [doi]
- A Regex Minimization Benchmark: A PSPACE-Complete Challenge for Language ModelsHyundong Jin, Joonghyuk Hahn, Yo-Sub Han. 8020-8048 [doi]
- Teaching Small Language Models to Learn Logic through Meta-LearningLeonardo Bertolazzi, Manuel Vargas Guzmán 0001, Raffaella Bernardi, Maciej Malicki, Jakub Szymanik. 8049-8080 [doi]
- COMPACT: Building Compliance Paralegals via Clause Graph Reasoning over ContractsAyush Singh, Dishank Aggarwal, Pranav Bhagat, Ainulla Khan, Sameer Malik, Amar Prakash Azad. 8081-8112 [doi]
- Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic DatasetsOmar Momen, Emilie Sitter, J. Berenike Herrmann, Sina Zarrieß. 8113-8127 [doi]
- Repairing Regex Vulnerabilities via Localization-Guided InstructionsSicheol Sung, Joonghyuk Hahn, Yo-Sub Han. 8128-8142 [doi]
- Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and MoralityJana Jung, Marlene Lutz, Indira Sen, Markus Strohmaier. 8143-8173 [doi]
- ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error AnnotationsYindong Wang, Martin Preiß, Margarita Bugueño, Jan Vincent Hoffbauer, Abdullatif Ghajar, Tolga Buz, Gerard de Melo. 8174-8187 [doi]
- Cosine Similarity as Logits?: A Scalable Knowledge Probe Using Embedding Vectors from Generative Language ModelsTomoyuki Jinno, Kazuki Hayashi, Yusuke Sakai 0010, Hidetaka Kamigaito, Taro Watanabe. 8188-8200 [doi]
- Generating Multi-Aspect Queries for Conversational SearchZahra Abbasiantaeb, Simon Lupart, Mohammad Aliannejadi. 8201-8217 [doi]
- Navigating the Infinite Dynamic Web Space: Effective In-Context Exploration via Cognitive Multi-Agent CollaborationGuozhao Mo, Yanjiang Liu, Yafei Shi, Jiawei Chen 0011, Yang Li, Yaojie Lu 0001, Hongyu Lin, Ben He 0001, Le Sun 0001, Bo Zheng, Xianpei Han. 8218-8232 [doi]
- TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration TasksRyo Fujii, Makoto Morishita, Kazuki Yano, Jun Suzuki 0001. 8233-8264 [doi]
- Tandem Training for Language ModelsRobert West 0001, Ashton Anderson, Ece Kamar, Eric Horvitz. 8265-8278 [doi]
- Can MLLMs Find Their Way in a City? Exploring Emergent Navigation from Web-Scale KnowledgeDwip Dalal, Utkarsh Mishra, Narendra Ahuja, Nebojsa Jojic. 8279-8303 [doi]
- Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language ModelsAlla Chepurova, Aydar Bulatov, Mikhail Burtsev 0001, Yuri Kuratov. 8304-8319 [doi]
- CAIRE: Cultural Attribution of Images with RetrievalArnav Yayavaram, Siddharth Yayavaram, Simran Khanuja, Michael Saxon, Graham Neubig. 8320-8338 [doi]
- What Does Infect Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMsXinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto. 8339-8358 [doi]
- Redefining Retrieval Evaluation in the Era of LLMsGiovanni Trappolini, Florin Cuconasu, Simone Filice, Yoelle Maarek, Fabrizio Silvestri. 8359-8375 [doi]
- Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM EvaluationAbir Harrasse, Chaithanya Bandi, Hari Bandi. 8376-8392 [doi]
- IYKYK: Using language models to decode extremist cryptolectsChristine de Kock, Arij Riabi, Zeerak Talat, Michael Sejr Schlichtkrull, Pranava Madhyastha, Eduard H. Hovy. 8393-8409 [doi]
- Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language ModelsSawsan Alqahtani, Mir Tafseer Nayeem, Md. Tahmid Rahman Laskar, Tasnim Mohiuddin, M. Saiful Bari. 8410-8432 [doi]