Abstract is missing.
- Towards Automated Error Discovery: A Study in Conversational AIDominic Petrak, Thy Thy Tran, Iryna Gurevych. 1-23 [doi]
- Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMsMohsinul Kabir, Ajwad Abrar, Sophia Ananiadou. 24-51 [doi]
- Biased Tales: Cultural and Topic Bias in Generating Children's StoriesDonya Rooein, Vilém Zouhar, Debora Nozza, Dirk Hovy. 52-72 [doi]
- Large Language Models as Realistic Microservice Trace GeneratorsDonghyun Kim 0002, Sriram Ravula, Taemin Ha, Alex Dimakis, Daehyeok Kim, Aditya Akella. 73-91 [doi]
- JUDGEBERT: Assessing Legal Meaning Preservation Between SentencesDavid Beauchemin, Michelle Albert-Rochette, Richard Khoury, Pierre-Luc Déziel. 92-118 [doi]
- QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability JudgmentsDavid Beauchemin, Richard Khoury. 119-130 [doi]
- Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea. 131-145 [doi]
- A Systematic Analysis of Base Model Choice for Reward ModelingKian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, Jay Pujara. 146-164 [doi]
- Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even PerformanceBranislav Pecher, Ivan Srba, Mária Bieliková. 165-184 [doi]
- Is the Top Still Spinning? Evaluating Subjectivity in Narrative UnderstandingMelanie Subbiah, Akankshya Mishra, Grace Kim, Liyan Tang, Greg Durrett, Kathleen McKeown. 185-203 [doi]
- MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM TutorsJakub Macina, Nico Daheim, Ido Hakimi, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan. 204-221 [doi]
- Preemptive Detection and Correction of Misaligned Actions in LLM AgentsHaishuo Fang, Xiaodan Zhu, Iryna Gurevych. 222-244 [doi]
- Fingerprinting LLMs through Survey Item Factor Correlation: A Case Study on Humor Style QuestionnaireSimon Münker. 245-258 [doi]
- Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person RetrievalTianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang 0002, Qichuan Ding. 259-271 [doi]
- From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement LearningDavid Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan. 272-292 [doi]
- CompKBQA: Component-wise Task Decomposition for Knowledge Base Question AnsweringYuhang Tian, Dandan Song 0005, Zhijing Wu 0001, Pan Yang, Changzhi Zhou, Jun Yang, Hao Wang 0163, Huipeng Ma, Chenhao Li, Luan Zhang. 293-309 [doi]
- Permutative Preference Alignment from Listwise Ranking of Human JudgmentsYang Zhao, Yixin Wang, Mingzhang Yin. 310-334 [doi]
- ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and PitchesJunyu Cheng, Chang Pan, Shuangyin Li. 335-353 [doi]
- SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity RecognitionZechen Li, Shohreh Deldari, Linyao Chen, Hao Xue 0001, Flora D. Salim. 354-379 [doi]
- MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic CorporaTuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang 0001, Trung Le 0001, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do. 380-396 [doi]
- ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in VideosPatrick Giedemann, Pius von Däniken, Jan Milan Deriu, Álvaro Rodrigo, Anselmo Peñas, Mark Cieliebak. 397-413 [doi]
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world EnvironmentsYuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu 0003. 414-431 [doi]
- Mixture of Length and Pruning Experts for Knowledge Graphs ReasoningEnjun Du, Siyi Liu, Yongqi Zhang. 432-453 [doi]
- MPRF: Interpretable Stance Detection through Multi-Path Reasoning FrameworkZhaodan Zhang, Jin Zhang 0029, Hui Xu, Jiafeng Guo, Xueqi Cheng. 454-470 [doi]
- Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter LevelsJunjie Ye 0005, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang 0001, Tao Gui, Xuanjing Huang 0001, Peng Wang 0095, Zhongchao Shi, Jianping Fan 0007. 471-513 [doi]
- 2S: Joint Influence-Aware Instruction Data Selection for Efficient Fine-TuningJingyu Wei, Bo Liu 0014, Tianjiao Wan, Baoyun Peng, Xingkong Ma, Mengmeng Guo. 514-527 [doi]
- SoundMind: RL-Incentivized Logic Reasoning for Audio-Language ModelsXingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui. 528-540 [doi]
- Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token CompressorsXiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng 0001. 541-558 [doi]
- RoT: Enhancing Table Reasoning with Iterative Row-Wise TraversalsXuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che. 559-579 [doi]
- T-MAD: Target-driven Multimodal Alignment for Stance DetectionZhaodan Zhang, Jin Zhang, Xueqi Cheng, Hui Xu. 580-595 [doi]
- Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in ConversationKun Peng, Cong Cao 0001, Hao Peng 0001, Guanlin Wu, Zhifeng Hao, Lei Jiang 0003, Yanbing Liu 0007, Philip S. Yu. 596-608 [doi]
- PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity MaximizationRuoxi Cheng, Yizhong Ding, Shuirong Cao, Ranjie Duan, Xiaoshuang Jia, Shaowei Yuan, Simeng Qin, Zhiqiang Wang, Xiaojun Jia. 609-628 [doi]
- Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language ModelsYilong Xu, Jinhua Gao, Xiaoming Yu, Yuanhai Xue, Baolong Bi, Huawei Shen, Xueqi Cheng. 629-648 [doi]
- SportReason: Evaluating Retrieval-Augmented Reasoning across Tables and Text for Sports Question AnsweringKaiyue Feng, Siyue Zhang, Bingsen Chen, Yilun Zhao 0001, Chen Zhao 0013. 649-662 [doi]
- MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary AwarenessJunsheng Huang, Zhitao He 0001, Yuchen Huang, Sandeep Polisetty, Qingyun Wang 0005, Yi R. Fung 0001. 663-676 [doi]
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-DistillationZhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du 0001, Yulan He 0001. 677-693 [doi]
- PAFT: Prompt-Agnostic Fine-TuningChenxing Wei, Mingwen Ou, Ying He 0006, Yao Shu, Fei Yu 0016. 694-717 [doi]
- Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric ReasoningLinger Deng, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu 0005, Xiang Bai. 718-735 [doi]
- TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence ConfigurationYanshu Li, Jianjiang Yang, Tian Yun 0001, Pinyuan Feng, Jinfa Huang, Ruixiang Tang. 736-763 [doi]
- Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic SurveyTianxin Xie, Yan Rong, Pengfei Zhang, Wenwu Wang 0001, Li Liu. 764-791 [doi]
- Automating Steering for Safe Multimodal Large Language ModelsLyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng. 792-814 [doi]
- EMNLP: Educator-role Moral and Normative Large Language Models ProfilingYilin Jiang, Mingzi Zhang, Sheng Jin, Zengyi Yu, Xiangjie Kong 0001, Binghao Tu. 815-843 [doi]
- TracSum: A New Benchmark for Aspect-Based Summarization with Sentence-Level Traceability in Medical DomainBohao Chu, MeiJie Li, Sameh Frihat, Chengyu Gu, Georg Lodde, Elisabeth Livingstone, Norbert Fuhr. 844-864 [doi]
- Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement LearningWenbin Hu 0001, Haoran Li 0003, Huihao Jing, Qi Hu, Ziqian Zeng, Sirui Han, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song. 865-883 [doi]
- Towards General-Domain Word Sense Disambiguation: Distilling Large Language Model into Compact DisambiguatorLiqiang Ming, Sheng-hua Zhong, Yuncong Li. 884-897 [doi]
- SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language ModelsHongyuan Lu, Zixuan Li, Zefan Zhang, Wai Lam. 898-913 [doi]
- Parallel Continuous Chain-of-Thought with Jacobi IterationHaoyi Wu, Zhihao Teng, Kewei Tu. 914-926 [doi]
- EQA-RM: A Generative Embodied Reward Model with Test-time ScalingYuhang Chen, Zhen Tan, Tianlong Chen. 927-945 [doi]
- Refusal-Aware Red Teaming: Exposing Inconsistency in Safety EvaluationsYongkang Chen, Xiaohu Du, Xiaotian Zou, Chongyang Zhao, Huan Deng, Hu Li, Xiaohui Kuang. 946-955 [doi]
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through ThinkingZekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu 0007, Runnan Fang, Yong Jiang 0005, Pengjun Xie, Fei Huang 0002, Huajun Chen, Ningyu Zhang 0001. 956-976 [doi]
- LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQLYihan Wang, Peiyu Liu, Xin Yang. 977-991 [doi]
- On Relation-Specific Neurons in Large Language ModelsYihong Liu 0001, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang 0003, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze. 992-1022 [doi]
- IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM AgentsHengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou 0001, Qingming Li, Tao Lin 0004, Shouling Ji. 1023-1039 [doi]
- ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question AnsweringXingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng 0004, Soroush Vosoughi, Jiang Gui. 1040-1057 [doi]
- SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMsYuanyang Yin, Yaqi Zhao, Yajie Zhang, Yuanxing Zhang, Ke Lin, Jiahao Wang, Xin Tao 0001, Pengfei Wan 0001, Wentao Zhang 0001, Feng Zhao. 1058-1070 [doi]
- Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property PredictionGeorge Arthur Baker, Mario Sanz-Guerrero, Katharina von der Wense. 1071-1085 [doi]
- Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language ModelsMing Wang, Miao Zhang, Xuebo Liu 0002, Liqiang Nie. 1086-1098 [doi]
- DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science AutomationZiming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huamin Zhang, Yu Huang. 1099-1123 [doi]
- VC4VG: Optimizing Video Captions for Text-to-Video GenerationYang Du 0011, Zhuoran Lin, Kaiqiang Song, Biao Wang, Zhicheng Zheng, Tiezheng Ge, Bo Zheng 0007, Qin Jin. 1124-1138 [doi]
- LaMP-QA: A Benchmark for Personalized Long-form Question AnsweringAlireza Salemi, Hamed Zamani. 1139-1159 [doi]
- The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden RepresentationsYubo Zhu, Dongrui Liu, Zecheng Lin, Wei Tong, Sheng Zhong 0002, Jing Shao. 1160-1176 [doi]
- MCIP: Protecting MCP Safety via Model Contextual Integrity ProtocolHuihao Jing, Haoran Li 0003, Wenbin Hu 0001, Qi Hu, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song. 1177-1194 [doi]
- SAKI-RAG: Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge IntegrationWenyu Tao, Xiaofen Xing, Zeliang Li, Xiangmin Xu. 1195-1213 [doi]
- Skeletons Matter: Dynamic Data Augmentation for Text-to-QueryYuchen Ji, Bo Xu 0023, Jie Shi 0010, Jiaqing Liang, Deqing Yang, Yu Mao, Hai Chen, Yanghua Xiao. 1214-1236 [doi]
- CondenseLM: LLMs-driven Text Dataset Condensation via Reward MatchingCheng Shen, Yew-Soon Ong, Joey Tianyi Zhou. 1237-1252 [doi]
- MovieCORE: COgnitive REasoning in MoviesGueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu. 1253-1272 [doi]
- Think Wider, Detect Sharper: Reinforced Reference Coverage for Document-Level Self-Contradiction DetectionYuhao Chen, Yuanjie Lyu, Shuochen Liu, Chao Zhang 0096, Junhui Lv, Tong Xu 0001. 1273-1288 [doi]
- DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian CultureArijit Maji, Raghvendra Kumar 0003, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha 0001. 1289-1313 [doi]
- LingGym: How Far Are LLMs from Thinking Like Field Linguists?Changbing Yang, Franklin Ma, Freda Shi, Jian Zhu. 1314-1340 [doi]
- Learning from Few Samples: A Novel Approach for High-Quality Malcode GenerationHaijian Ma, Daizong Liu, Xiaowen Cai 0001, Pan Zhou 0001, Yulai Xie. 1341-1358 [doi]
- Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative TasksSarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami. 1359-1372 [doi]
- VisualWebInstruct: Scaling up Multimodal Instruction Data through Web SearchYiming Jia, Jiachen Li, Xiang Yue, Bo Li 0080, Ping Nie, Kai Zou, Wenhu Chen. 1373-1393 [doi]
- Thinking Out Loud: Do Reasoning Models Know When They're Right?Qingcheng Zeng, Weihao Xuan, Leyang Cui, Rob Voigt. 1394-1407 [doi]
- Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language ModelsWeihao Xuan, Qingcheng Zeng, Heli Qi, Junjue Wang, Naoto Yokoya. 1408-1450 [doi]
- Enhancing Efficiency and Exploration in Reinforcement Learning for LLMsMengqi Liao, Xiangyu Xi, Ruinian Chen, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan. 1451-1463 [doi]
- LLM Bias Detection and Mitigation through the Lens of Desired DistributionsIngroj Shrestha, Padmini Srinivasan. 1464-1480 [doi]
- MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question AnsweringTeng Lin, Yuyu Luo, Honglin Zhang, Jicheng Zhang, Chunlin Liu, Kaishun Wu, Nan Tang 0001. 1481-1494 [doi]
- POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge DistillationYifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng. 1495-1512 [doi]
- MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model EvaluationWeihao Xuan, Rui Yang 0016, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu, Yuang Jiang, Huitao Li, Xin Li 0079, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen 0001, Douglas Teodoro, Nan Liu 0003, Randy Goebel, Lei Ma 0003, Edison Marrese-Taylor, Shijian Lu, Yusuke Iwasawa, Yutaka Matsuo, Irene Li. 1513-1532 [doi]
- NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code DebuggingWeiming Zhang, Qingyao Li, Xinyi Dai, Jizheng Chen, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu 0001, Weinan Zhang 0001. 1533-1549 [doi]
- Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PDBryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee. 1550-1575 [doi]
- POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document ConversionYuan Liu, Zhongyin Zhao, Le Tian, Haicheng Wang, Xubing Ye, Yangxiu You, Zilin Yu, Chuhan Wu, Zhou Xiao, Yang Yu 0038, Jie Zhou. 1576-1601 [doi]
- Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review CompositionXuemei Tang, Xufeng Duan, Zhenguang G. Cai. 1602-1617 [doi]
- CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMsNafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner. 1618-1639 [doi]
- From Schema to State: Zero-Shot Scheme-Only Dialogue State Tracking via Diverse Synthetic Dialogue and Step-by-Step DistillationHuan Xu, Zequn Li, Wen Tang, Jian-Jun Zhang. 1640-1652 [doi]
- Beyond the Surface: Measuring Self-Preference in LLM JudgmentsZhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin. 1653-1672 [doi]
- Beyond Input Activations: Identifying Influential Latents by Gradient Sparse AutoencodersDong Shu, Xuansheng Wu, Haiyan Zhao 0003, Mengnan Du, Ninghao Liu. 1673-1682 [doi]
- Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented GenerationHengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin 0001, Xueqi Cheng. 1683-1702 [doi]
- CiteBART: Learning to Generate Citations for Local Citation RecommendationEge Yigit Çelik, Selma Tekir. 1703-1719 [doi]
- Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical DefinitionsLan Zhang, Marco Valentino, André Freitas. 1720-1738 [doi]
- Culture Cartography: Mapping the Landscape of Cultural KnowledgeCaleb Ziems, William Barr Held, Jane Yu 0001, Amir Goldberg, David Grusky, Diyi Yang. 1739-1757 [doi]
- Interpretability Analysis of Arithmetic In-Context Learning in Large Language ModelsGregory Polyakov, Christian Hepting, Carsten Eickhoff, Seyed Ali Bahrainian. 1758-1777 [doi]
- SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm IntelligenceYao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp. 1778-1818 [doi]
- We Politely Insist: Your LLM Must Learn the Persian Art of TaarofNikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami. 1819-1838 [doi]
- Unstructured Evidence Attribution for Long Context Query Focused SummarizationDustin Wright 0001, Zain Muhammad Mujahid, Lu Wang 0008, Isabelle Augenstein, David Jurgens. 1839-1867 [doi]
- RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural LanguageSubrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam. 1868-1894 [doi]
- Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model ReasoningMingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen 0025, Yongjoo Park, Minjia Zhang, ChengXiang Zhai, Klara Nahrstedt. 1895-1909 [doi]
- Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language ModelsXuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang. 1910-1924 [doi]
- Router-Tuning: A Simple and Effective Approach for Dynamic DepthShwai He, Tao Ge 0001, Guoheng Sun, Bowei Tian, Xiaoyang Wang 0001, Dong Yu 0001. 1925-1938 [doi]
- Foot-In-The-Door: A Multi-turn Jailbreak for LLMsZixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang. 1939-1950 [doi]
- TurnaboutLLM: A Deductive Reasoning Benchmark from Detective GamesYuan Yuan, Muyu He, Muhammad Adil Shahid, Ziyang Li, Jiani Huang, Li Zhang. 1951-1965 [doi]
- Transferable Direct Prompt Injection via Activation-Guided MCMC SamplingMinghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, Pei Xiaobing, Jing Wang. 1966-1978 [doi]
- Direct Judgement Preference OptimizationPeiFeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty. 1979-2009 [doi]
- WebInject: Prompt Injection Attack to Web AgentsXilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong. 2010-2030 [doi]
- F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality ConsiderationsTian Lan, Jiang Li, Yemin Wang, Xu Liu, Xiangdong Su, Guanglai Gao. 2031-2046 [doi]
- Value Profiles for Encoding Human VariationTaylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel A. Bakker, Georgina Evans, Iason Gabriel, Noah D. Goodman, Verena Rieser. 2047-2095 [doi]
- Language Models as Causal Effect GeneratorsLucius E. J. Bynum, KyungHyun Cho. 2096-2115 [doi]
- Constructions are Revealed in Word DistributionsJoshua Rozner, Leonie Weissweiler, Kyle Mahowald, Cory Shain. 2116-2138 [doi]
- CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 LanguagesYilun Yang, Yekun Chai. 2139-2169 [doi]
- RBPtool: A Deep Language Model Framework for Multi-Resolution RBP-RNA Binding Prediction and RNA Molecule DesignJiyue Jiang, Yitao Xu, Zikang Wang, Yihan Ye, Yanruisheng Shao, Yuheng Shan, Jiuming Wang, Xiaodan Fan, Jiao Yuan, Yu Li 0006. 2170-2185 [doi]
- Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate RatioYiran Yang, Haifeng Sun 0001, Jingyu Wang, Qi Qi, Zirui Zhuang, Huazheng Wang, Pengfei Ren 0001, Jing Wang, Jianxin Liao. 2186-2206 [doi]
- SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language ModelsZirui He, Mingyu Jin, Bo Shen, Ali Payani, Yongfeng Zhang, Mengnan Du. 2207-2236 [doi]
- BabyLM's First Constructions: Causal interventions provide a signal of learningJoshua Rozner, Leonie Weissweiler, Cory Shain. 2237-2249 [doi]
- Effective Red-Teaming of Policy-Adherent AgentsItay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor. 2250-2268 [doi]
- CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question AnsweringZongxi Li, Yang Li 0072, Haoran Xie 0001, S. Joe Qin. 2269-2288 [doi]
- SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific DiscoveryKunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yu Su 0001, Haofei Yu, Jiaxuan You. 2289-2317 [doi]
- Improving Informally Romanized Language IdentificationAdrian Benton, Alexander Gutkin, Christo Kirov, Brian Roark. 2318-2336 [doi]
- Integral Transformer: Denoising Attention, Not Too Much Not Too LittleIvan Kobyzev, Abbas Ghaddar, Dingtao Hu, Boxing Chen. 2337-2354 [doi]
- CHENGYU-BENCH: Benchmarking Large Language Models for Chinese Idiom Understanding and UseYicheng Fu, Zhemin Huang, Liuxin Yang, Yumeng Lu, Zhongdongming Dai. 2355-2366 [doi]
- Improving Cross Lingual Transfer by Pretraining with Active ForgettingDivyanshu Aggarwal, Ashutosh Sathe, Sunayana Sitaram. 2367-2378 [doi]
- Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference OptimizationShuo Xing, Peiran Li, Yuping Wang, Ruizheng Bai, Yueqi Wang, Chan-Wei Hu, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu. 2379-2397 [doi]
- To Mask or to Mirror: Human-AI Alignment in Collective ReasoningCrystal Qian, Aaron T. Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, Lucas Dixon. 2398-2423 [doi]
- SWAN: An Efficient and Scalable Approach for Long-Context Language ModelingKrishna C. Puvvada, Faisal Ladhak, Santiago Akle Serano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg. 2424-2438 [doi]
- LLMs Behind the Scenes: Enabling Narrative Scene IllustrationMelissa Roemmele, John Joon Young Chung, Taewook Kim 0001, Yuqian Sun, Alex Calderwood, Max Kreminski. 2439-2457 [doi]
- REARANK: Reasoning Re-ranking Agent via Reinforcement LearningLe Zhang, Bo Wang 0084, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal. 2458-2471 [doi]
- Large Language Models Do Multi-Label Classification DifferentlyMarcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, Shrikanth Narayanan. 2472-2495 [doi]
- FilBench: Can LLMs Understand and Generate Filipino?Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial. 2496-2529 [doi]
- M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment AnalysisChengyan Wu, Bolei Ma, Yihong Liu 0001, Zheyu Zhang 0007, Ningyuan Deng, Yanshu Li, Baolan Chen, Yi Zhang, Yun Xue, Barbara Plank. 2530-2557 [doi]
- RuCCoD: Towards Automated ICD Coding in RussianAlexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, Elena Tutubalina. 2558-2585 [doi]
- Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMsDayu Yang, Tianyang Liu 0003, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo 0001, Julian J. McAuley. 2586-2616 [doi]
- Efficient Model Development through Fine-tuning TransferPin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu. 2617-2636 [doi]
- Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal CausesMingyang Wang 0003, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Strötgen, Hinrich Schütze. 2637-2665 [doi]
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning SignalYuhan Liu, Michael Jq Zhang, Eunsol Choi. 2666-2681 [doi]
- Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMsYu-Wen Chen, Melody Ma, Julia Hirschberg. 2682-2694 [doi]
- COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language ModelsSanchit Sinha, Guangzhi Xiong, Aidong Zhang 0001. 2695-2711 [doi]
- SurveyGen: Quality-Aware Scientific Survey Generation with Large Language ModelsTong Bao, Mir Tafseer Nayeem, Davood Rafiei, Chengzhi Zhang. 2712-2736 [doi]
- VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech EditingZhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath. 2737-2756 [doi]
- From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judgeDawei Li 0008, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan 0001, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng 0001, Huan Liu 0001. 2757-2791 [doi]
- MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text ClassificationIustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea. 2792-2808 [doi]
- TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style GamesPrakamya Mishra, Jiang Liu, Jialian Wu, Xiaodong Yu, Zicheng Liu, Emad Barsoum. 2809-2831 [doi]
- Learning from Diverse Reasoning Paths with Routing and CollaborationZhenyu Lei 0004, Zhen Tan 0001, Song Wang 0013, Yaochen Zhu, Zihan Chen 0002, Yushun Dong, Jundong Li. 2832-2845 [doi]
- Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded ReasoningJiayuan Zhu, JiaZhen Pan, Yuyuan Liu, Fenglin Liu, JunDe Wu. 2846-2857 [doi]
- MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language ModelsShrey Pandit, Jiawei Xu 0006, Junyuan Hong, Zhangyang Wang, Tianlong Chen 0001, Kaidi Xu, Ying Ding 0001. 2858-2873 [doi]
- NUTMEG: Separating Signal From Noise in Annotator DisagreementJonathan Ivey, Susan Gauch, David Jurgens. 2874-2887 [doi]
- Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled RepresentationsAbhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das 0001. 2888-2947 [doi]
- MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing PlatformHayoung Jung, Shravika Mittal, Ananya Aatreya, Navreet Kaur 0002, Munmun De Choudhury, Tanushree Mitra. 2948-2982 [doi]
- Demystifying optimized prompts in language modelsRimon Melamed, Lucas H. McCabe, H. Howie Huang. 2983-2999 [doi]
- Whisper-UT: A Unified Translation Framework for Speech and TextCihan Xiao, Matthew Wiesner, Debashish Chakraborty, Reno Kriz, Keith Cunningham, Kenton Murray, Kevin Duh, Luis Tavarez-Arce, Paul McNamee, Sanjeev Khudanpur. 3000-3016 [doi]
- Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One ProblemYubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen. 3017-3027 [doi]
- Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model GenerationHongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang. 3028-3046 [doi]
- BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic RepresentationTianhao Zhang, Zhecheng Sheng, Zhexiao Lin, Chen Jiang, Dongyeop Kang. 3047-3061 [doi]
- SAND: Boosting LLM Agents with Self-Taught Action DeliberationYu Xia 0007, Yiran Shen 0004, Junda Wu, Tong Yu 0001, SungChul Kim, Ryan A. Rossi, Lina Yao 0001, Julian J. McAuley. 3062-3077 [doi]
- LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact AssessmentLingyao Li, Dawei Li, Zhenhui Ou, Xiaoran Xu, Jingxiao Liu, Zihui Ma, Runlong Yu, Min Deng. 3078-3096 [doi]
- Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?Hua Shen 0005, Nicholas Clark, Tanu Mitra. 3097-3118 [doi]
- Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-TimeJiazheng Li 0002, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui 0003, Cesare Aloisi, Yulan He 0001. 3119-3140 [doi]
- Image Embedding Sampling Method for Diverse CaptioningSania Waheed, Na Min An. 3141-3157 [doi]
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a TimeHuihan Li 0001, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta 0001, Xiang Ren 0001. 3158-3180 [doi]
- FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4Jiarui Yao, Ruida Wang, Tong Zhang. 3181-3200 [doi]
- Date Fragments: A Hidden Bottleneck of Tokenization for Temporal ReasoningGagan Bhatia, Maxime Peyrard, Wei Zhao. 3201-3219 [doi]
- Measuring Risk of Bias in Biomedical Reports: The RoBBR BenchmarkJianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen. 3220-3248 [doi]
- SHIFT: Selected Helpful Informative Frame for Video-guided Machine TranslationBoyu Guan, Chuang Han, Yining Zhang, Yupu Liang, Zhiyang Zhang, Yang Zhao 0007, Chengqing Zong. 3249-3267 [doi]
- Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code ExecutorsBohan Lyu 0001, Siqiao Huang, Zichen Liang, Qian Sun, Jiaming Zhang. 3268-3308 [doi]
- Few-Shot Learning Translation from New LanguagesCarlos Mullov, Alexander Waibel. 3309-3330 [doi]
- Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of DesignYunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu 0004, Mona T. Diab. 3331-3350 [doi]
- TokenSkip: Controllable Chain-of-Thought Compression in LLMsHeming Xia, Chak Tou Leong, Wenjie Wang 0007, Yongqi Li 0001, Wenjie Li 0002. 3351-3363 [doi]
- Are Generative Models Underconfident? Better Quality Estimation with Boosted Model ProbabilityTu Anh Dinh, Jan Niehues. 3364-3382 [doi]
- reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed InputsZhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad. 3383-3409 [doi]
- Why Do Some Inputs Break Low-Bit LLM Quantization?Ting-Yun Chang, Muru Zhang, Jesse Thomason, Robin Jia. 3410-3429 [doi]
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationKeisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci. 3430-3442 [doi]
- AROMA: Autonomous Rank-one Matrix AdaptationHao Nan Sheng, Zhi-Yong Wang, Hing-Cheung So, Mingrui Yang. 3443-3459 [doi]
- Large Language Models Have Intrinsic Meta-Cognition, but Need a Good LensZiyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou. 3460-3477 [doi]
- Anchoring-Guidance Fine-Tuning (AnGFT): Elevating Professional Response Quality in Role-Playing Conversational AgentsQibin Li, Zhen Xu, Shengyuan Bai, Nianmin Yao, Kaili Sun, Bowen Wu, Ying Li, Baoxun Wang. 3478-3496 [doi]
- RiTTA: Modeling Event Relations in Text-to-Audio GenerationYuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet. 3497-3511 [doi]
- Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMsXiaofeng Zhang, Yihao Quan, Chen Shen 0003, Chaochen Gu, Xiaosong Yuan, Shaotian Yan, Jiawei Cao, Hao Cheng 0004, Kaijie Wu 0002, Jieping Ye. 3512-3534 [doi]
- WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in ThaiPeerat Limkonchotiwat, Pume Tuchinda, Lalita Lowphansirikul, Surapon Nonesung, Panuthep Tasawong, Alham Fikri Aji, Can Udomcharoenchaikit, Sarana Nutanong. 3535-3558 [doi]
- MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language ModelsZhengyi Zhao 0001, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang 0004, Zezhong Wang 0004, Huimin Wang, Yutian Zhao, Bin Liang 0004, Yefeng Zheng 0001, Binyang Li, Kam-Fai Wong, Xian Wu 0001. 3559-3582 [doi]
- A Comprehensive Literary Chinese Reading Comprehension Dataset with an Evidence Curation Based SolutionDongning Rao, Rongchu Zhou, Peng Chen, Zhihua Jiang. 3583-3603 [doi]
- Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQLJie Shi 0010, Xi Cao, Bo Xu 0023, Jiaqing Liang, Yanghua Xiao, Jia Chen 0037, Peng Wang 0027, Wei Wang 0009. 3604-3619 [doi]
- FinMTEB: Finance Massive Text Embedding BenchmarkYixuan Tang, Yi Yang. 3620-3638 [doi]
- Scaling Rich Style-Prompted Text-to-Speech DatasetsAnuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi. 3639-3659 [doi]
- Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMsMahammed Kamruzzaman, Gene Louis Kim. 3660-3678 [doi]
- Eliciting Implicit Acoustic Styles from Open-domain Instructions to Facilitate Fine-grained Controllable Generation of SpeechJianxing Yu, Zihao Gou, Chen Li, Zhisheng Wang 0001, Peiji Yang, Wenqing Chen, Jian Yin 0001. 3679-3695 [doi]
- OBLIVIATE: Robust and Practical Machine Unlearning for Large Language ModelsXiaoyu Xu, Minxin Du, Qingqing Ye 0001, Haibo Hu 0001. 3696-3715 [doi]
- AdaptThink: Reasoning Models Can Learn When to ThinkJiajie Zhang, Nianyi Lin, Lei Hou 0001, Ling Feng, Juanzi Li. 3716-3730 [doi]
- 2: An Adaptive Test-Time Scaling Strategy for Contextual Question AnsweringZhengyi Zhao 0001, Shubo Zhang, Zezhong Wang 0004, Huimin Wang, Yutian Zhao, Bin Liang 0004, Yefeng Zheng 0001, Binyang Li, Kam-Fai Wong, Xian Wu 0001. 3731-3756 [doi]
- Non-Existent Relationship: Fact-Aware Multi-Level Machine-Generated Text DetectionYang Wu, Ruijia Wang, Jie Wu. 3757-3768 [doi]
- Calibrating Verbal Uncertainty as a Linear Feature to Reduce HallucinationsZiwei Ji 0001, Lei Yu, Yeskendir Koishekenov, Yejin Bang, Anthony Hartshorn, Alan Schelten, Cheng Zhang, Pascale Fung, Nicola Cancedda. 3769-3793 [doi]
- JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal ReasoningHuanghai Liu, Quzhe Huang, Qingjing Chen, Yiran Hu, Jiayu Ma, Yun Liu, Weixing Shen, Yansong Feng 0002. 3794-3814 [doi]
- CIE: Controlling Language Model Text Generations Using Continuous SignalsVinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito. 3815-3825 [doi]
- Stand on The Shoulders of Giants: Building JailExpert from Previous Attack ExperienceXi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, bin Ji, Ma Jun, Xiaodong Liu, Jing Wang, Jianfeng Zhang, Jie Yu, Feilong Bao, Wangbaosheng. 3826-3843 [doi]
- Language-to-Space Programming for Training-Free 3D Visual GroundingBoyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang. 3844-3864 [doi]
- RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented InstructionsWanlong Liu, Junying Chen, Ke-ji, Li Zhou 0010, Wenyu Chen 0001, Benyou Wang. 3865-3888 [doi]
- AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time AdaptationYilong Lai, Jialong Wu 0007, Zhenglin Wang, Deyu Zhou. 3889-3905 [doi]
- SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen 0001, Hongsheng Li 0001, Fangyuan Li. 3906-3931 [doi]
- F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text TaskTan Yue, Rui Mao 0010, Zilong Song, Zonghai Hu, Dongyan Zhao 0001. 3932-3948 [doi]
- 2: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent RegulationQiyuan Chen 0003, Hongsen Huang, Qian Shao, Jiahe Chen, Jintai Chen, Hongxia Xu, Renjie Hua, Ren Chuan, Jian Wu 0001. 3949-3968 [doi]
- DSCD: Large Language Model Detoxification with Self-Constrained DecodingMing Dong 0004, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu 0001, Tingting He 0003. 3969-3984 [doi]
- From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 ModelsJue Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001. 3985-4002 [doi]
- Quantifying Language Disparities in Multilingual Large Language ModelsSongbo Hu, Ivan Vulic, Anna Korhonen. 4003-4018 [doi]
- KoBLEX: Open Legal Question Answering with Multi-hop ReasoningJihyung Lee, Daehui Kim, Seonjeong Hwang, Hyounghun Kim, Gary Lee. 4019-4053 [doi]
- End-to-End Learnable Psychiatric Scale Guided Risky Post Screening for Depression Detection on Social MediaBichen Wang, Yuzhe Zi, Yixin Sun, Hao Yang 0066, Yanyan Zhao, Bing Qin 0001. 4054-4066 [doi]
- ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QAXinjie Zhao 0004, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang 0016, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li. 4067-4089 [doi]
- Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials SciencePeter A. Jansen, Samiah Hassan, Ruoyao Wang. 4090-4102 [doi]
- ModRWKV: Transformer Multimodality in Linear TimeJiale Kang, Ziyin Yue, Qingyu Yin, Rui Jiang, Weile Li, Zening Lu, Zhouran Ji. 4103-4115 [doi]
- Multimedia Event Extraction with LLM Knowledge EditingJiaao Yu, Yijing Lin, Zhipeng Gao 0001, Xuesong Qiu 0001, Lanlan Rui. 4116-4124 [doi]
- Exploring the Impact of Personality Traits on LLM Bias and ToxicityShuo Wang 0013, Renhao Li, Xi Chen, Yulin Yuan, Min Yang 0007, Derek F. Wong. 4125-4143 [doi]
- Task-aware Contrastive Mixture of Experts for Quadruple Extraction in Conversations with Code-like Replies and Non-opinion DetectionChenyuan He, Yuxiang Jia, Fei Gao, Senbin Zhu, Hongde Liu 0002, Hongying Zan, Min Peng. 4144-4159 [doi]
- Mitigating Biases in Language Models via Bias UnlearningDianqing Liu, Yi Liu, Guoqing Jin, Zhendong Mao 0001. 4160-4178 [doi]
- UNComp: Can Matrix Entropy Uncover Sparsity? - A Compressor Design from an Uncertainty-Aware PerspectiveJing Xiong, Jianghan Shen, Fanghua Ye 0001, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Min Yang 0007, Lingpeng Kong, Ngai Wong 0001. 4179-4199 [doi]
- Superpose Task-specific Features for Model MergingHaiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao. 4200-4214 [doi]
- FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial DomainSuifeng Zhao, Zhuoran Jin, Sujian Li, Jun Gao. 4215-4249 [doi]
- BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking MechanismQinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan 0001. 4250-4272 [doi]
- Diffusion vs. Autoregressive Language Models: A Text Embedding PerspectiveSiyue Zhang, Yilun Zhao 0001, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao 0013. 4273-4303 [doi]
- BannerAgency: Advertising Banner Design with Multimodal LLM AgentsHeng Wang, Yotaro Shimose, Shingo Takamatsu. 4304-4329 [doi]
- DIDS: Domain Impact-aware Data Sampling for Large Language Model TrainingWeijie Shi, Jipeng Zhang, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Yao Zhao, Hao Chen, Ruiyuan Zhang, Yue Cui 0001, Jia Zhu 0003, Sirui Han, Jiajie Xu 0001, Xiaofang Zhou 0001. 4330-4350 [doi]
- Training LLMs to be Better Text Embedders through Bidirectional ReconstructionChang Su, Dengliang Shi, Siyuan Huang 0003, Jintao Du, Changhua Meng, Yu Cheng 0005, Weiqiang Wang 0002, Zhouhan Lin. 4351-4369 [doi]
- ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward ModelingShaomu Tan, Christof Monz. 4370-4387 [doi]
- SolEval: Benchmarking Large Language Models for Repository-level Solidity Smart Contract GenerationZhiyuan Peng, Xin Yin, Rui Qian 0002, Peiqin Lin, Yongkang Liu, Hao Zhang, Chenhao Ying 0001, Yuan Luo 0003. 4388-4411 [doi]
- In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language VarietiesNathan Roll, Calbert Graham, Yuka Tatsumi, Kim Tien Nguyen, Meghan Sumner, Dan Jurafsky. 4412-4426 [doi]
- Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning SkillsChangsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu 0001. 4427-4443 [doi]
- Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image CaptionsYijun Shen, Delong Chen, Fan Liu, Xingyu Wang, Chuanyi Zhang, Liang Yao, Yuhui Zheng. 4444-4464 [doi]
- DecoupleSearch: Decouple Planning and Search via Hierarchical Reward ModelingHao Sun 0015, Zile Qiao, Bo Wang 0134, Guoxin Chen, Yingyan Hou, Yong Jiang 0005, Pengjun Xie, Fei Huang 0002, Yan Zhang 0117. 4465-4478 [doi]
- RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data SynthesisJianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen 0002, Ziqian Zeng. 4479-4500 [doi]
- Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation RecognitionHaorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang, Bin Wu. 4501-4520 [doi]
- LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements GenerationChaeeun Kim, Jinu Lee, Wonseok Hwang. 4521-4554 [doi]
- ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question AnsweringJingxuan Wei, Nan Xu, Junnan Zhu, Haoyanni, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang. 4555-4569 [doi]
- COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI AutomationDi Zhao, Longhui Ma, Siwei Wang 0001, Miao Wang, Zhao Lv. 4570-4593 [doi]
- DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic AwarenessJiguo Liu, Chao Liu 0020, Meimei Li, Nan Li, Shihao Gao, Dali Zhu. 4594-4610 [doi]
- Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying BiasAvinash Madasu, Vasudev Lal, Phillip Howard. 4611-4626 [doi]
- CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank ActivationZiyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul D. Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang. 4627-4645 [doi]
- TS-CLIP: Time Series Understanding by CLIPZiwen Chen, Xiaoyuan Zhang, Ming Zhu. 4646-4664 [doi]
- MultiAgentESC: A LLM-based Multi-Agent Collaboration Framework for Emotional Support ConversationYangyang Xu, Jinpeng Hu, Zhuoer Zhao, Zhangling Duan, Xiao Sun 0003, Xun Yang 0001. 4665-4681 [doi]
- Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy ModelsYilin Wang 0039, Heng Wang 0008, Yuyang Bai, Minnan Luo. 4682-4698 [doi]
- Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds DecodingYun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu 0001, Timothy T. Rogers. 4699-4713 [doi]
- Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and ExtrapolationJun-Yu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang 0009, Haitao Mi, Dong Yu 0001. 4714-4720 [doi]
- Scalable Data Synthesis through Human-like Cognitive Imitation and Data RecombinationZhongyi Ye, Weitai Zhang, Xinyuan Zhou, Yongxin Zhu, Ninghui Rao, Enhong Chen. 4721-4735 [doi]
- BeSimulator: A Large Language Model Powered Text-based Behavior SimulatorJianan Wang, Bin Li, Jingtao Qi, Xueying Wang, Fu Li, Lihanxun Li. 4736-4754 [doi]
- Too Consistent to Detect: A Study of Self-Consistent Errors in LLMsHexiang Tan, Fei Sun 0001, Sha Liu, Du Su, Qi Cao 0005, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng. 4755-4765 [doi]
- pFedGPT: Hierarchically Optimizing LoRA Aggregation Weights for Personalized Federated GPT ModelsZhanming Shen, Tianqi Xu, Hao Wang, Jian Li, Miao Pan. 4766-4778 [doi]
- QSpec: Speculative Decoding with Complementary Quantization SchemesJuntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu. 4779-4795 [doi]
- Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text ClusteringZetong Li, Qinliang Su, Minhua Huang, Yin Yang. 4796-4808 [doi]
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMsYidan Zhang, Yu Wan 0004, Boyi Deng, Baosong Yang, Haoran Wei, Fei Huang 0002, Bowen Yu 0002, Dayiheng Liu, Junyang Lin, Fei Huang 0005, Jingren Zhou 0001. 4809-4836 [doi]
- Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token OptimizationYutao Zhu 0001, Jiajie Jin, Hongjin Qian, Zheng Liu 0011, Zhicheng Dou, Ji-Rong Wen. 4837-4856 [doi]
- TrInk: Ink Generation with Transformer NetworkZezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe Li 0030, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu 0001, Shujie Liu 0001. 4857-4864 [doi]
- CalligraphicOCR for Chinese Calligraphy RecognitionXiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang. 4865-4877 [doi]
- When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language ModelsCheng Wang, Gelei Deng, XiangLin Yang, Han Qiu 0001, Tianwei Zhang 0004. 4878-4888 [doi]
- RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language ModelsPingyi Hu, Xiaofan Bai, Xiaojing Ma 0002, Chaoxiang He, Dongmei Zhang 0001, Bin Benjamin Zhu. 4889-4903 [doi]
- Model-based Large Language Model Customization as ServiceZhaomin Wu, Jizhou Guo, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang. 4904-4921 [doi]
- Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsHaochen Sun, Shuwen Zhang, Lujie Niu, Lei Ren, Hao Xu, Hao Fu, Fangkun Zhao, Caixia Yuan, Xiaojie Wang. 4922-4951 [doi]
- Improving Reasoning Capabilities in Small Models through Mixture-of-layers Distillation with Stepwise Attention on Key InformationYao Chen, Jiawei Sheng, Wenyuan Zhang 0002, Tingwen Liu. 4952-4971 [doi]
- Through the Valley: Path to Effective Long CoT Training for Small Language ModelsRenjie Luo, Jiaxi Li, Chen Huang, Wei Lu. 4972-4992 [doi]
- RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward RedistributionJiahui Li 0003, Lin Li 0065, Tai-Wei Chang, Kun Kuang, Long Chen 0016, Jun Zhou 0011, Cheng Yang 0002. 4993-5022 [doi]
- SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language ModelsPeng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang. 5023-5037 [doi]
- InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning StylesZizhen Li, Chuanhao Li 0001, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang. 5038-5076 [doi]
- MIO: A Foundation Model on Multimodal TokensZekun Moore Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jessie Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang 0001, Yuanxing Zhang, Ge Zhang 0009, Ke Xu 0001, Jie Fu 0001, Wenhao Huang 0001. 5077-5099 [doi]
- DART: Distilling Autoregressive Reasoning to Silent ThoughtNan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian. 5100-5108 [doi]
- LeTS: Learning to Think-and-Search via Process-and-Outcome Reward HybridizationQi Zhang 0077, Shouqing Yang, Lirong Gao, Hao Chen 0102, Xiaomeng Hu, Jinglei Chen, Jiexiang Wang, Sheng Guo, Bo Zheng, Haobo Wang 0001, Junbo Zhao 0002. 5109-5122 [doi]
- CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle ConsistencyZhanming Shen, Hao Chen 0102, Yulei Tang, ShaoLin Zhu, Wentao Ye, Xiaomeng Hu, Haobo Wang 0001, Gang Chen 0001, Junbo Zhao 0002. 5123-5137 [doi]
- Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?Grace LeFevre, Qingcheng Zeng, Adam Leif, Jason Jewell, Denis Peskoff, Rob Voigt. 5138-5150 [doi]
- From General Reward to Targeted Reward: Improving Open-ended Long-context Generation ModelsZhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang 0005, Irwin King. 5151-5166 [doi]
- Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning ModelXinyue Lou, You Li 0010, Jinan Xu, Xiangyu Shi, Chi Chen 0005, Kaiyu Huang. 5167-5186 [doi]
- Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language ModelsBajian Xiang, Shuaijiang Zhao, Tingwei Guo, Wei Zou. 5187-5202 [doi]
- AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing AmbiguityYifan Liu, Wenkuan Zhao, ShanShan Zhong, Jinghui Qin, Mingfu Liang, Zhongzhan Huang, Wushao Wen. 5203-5219 [doi]
- M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language ModelsZexuan Li, Hongliang Dai, Piji Li. 5220-5238 [doi]
- R-TOFU: Unlearning in Large Reasoning ModelsSangyeon Yoon, Wonje Jeung, Albert No. 5239-5258 [doi]
- Chat-Driven Text Generation and Interaction for Person RetrievalZequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, Tao Jin. 5259-5270 [doi]
- Spontaneous Giving and Calculated Greed in Language ModelsYuxuan Li, Hirokazu Shirado. 5271-5286 [doi]
- SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context FusionLei Jiang, Desheng Wu, Xiaolong Zheng 0001. 5287-5302 [doi]
- Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language ModelsMo Zhiqiang, Yang Hua, Jiahui Li, Yuan Liu, Shawn Wong, Jianmin Huang. 5303-5320 [doi]
- Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy ParadigmZhuo Li, Yuhao Du, Xiaoqi Jiao, Steven Y. Guo, Yuege Feng, Xiang Wan, Anningzhe Gao, Jinpeng Hu. 5321-5340 [doi]
- QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language ModelsJiajun Zhou 0004, Yifan Yang, Kai Zhen, Ziyue Liu 0003, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong 0001, Zheng Zhang 0005. 5341-5359 [doi]
- Cost-Optimal Grouped-Query Attention for Long-Context ModelingYingfa Chen, Yutong Wu, Chenyang Song, Zhen Leng Thai, Xingyu Shen, Xu Han 0007, Zhiyuan Liu 0001, Maosong Sun 0001. 5360-5376 [doi]
- ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelZhongyi Zhou, Yichen Zhu 0001, Minjie Zhu, Junjie Wen, Ning Liu 0007, Zhiyuan Xu, Weibin Meng, Yaxin Peng, Chaomin Shen 0001, Feifei Feng, Yi Xu. 5377-5395 [doi]
- KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented GenerationZiyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen 0003, Thanh-Toan Nguyen, Pengfei Xian, Wenao Ma, Shengchao Qin, Graziano Chesi, Ngai Wong 0001. 5396-5405 [doi]
- CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingJihai Zhang 0002, Xiaoye Qu, Tong Zhu 0002, Yu Cheng 0001. 5406-5419 [doi]
- Search-o1: Agentic Search-Enhanced Large Reasoning ModelsXiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou 0002, Yutao Zhu 0001, Peitian Zhang, Zhicheng Dou. 5420-5438 [doi]
- From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support ConversationsShenghan Wu, Yimo Zhu, Wynne Hsu, Mong-Li Lee, Yang Deng 0002. 5439-5453 [doi]
- Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language ModelsShuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu Xiang, Zhaofeng He. 5454-5477 [doi]
- TombRaider: Entering the Vault of History to Jailbreak Large Language ModelsJunchen Ding, Jiahao Zhang, Yi Liu 0069, Ziqi Ding, Gelei Deng, Yuekang Li. 5478-5493 [doi]
- Text Meets Topology: Rethinking Out-of-distribution Detection in Text-Rich NetworksDanny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang. 5494-5523 [doi]
- APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal TransportZhuo Li, Yuege Feng, Dandan Guo, Jinpeng Hu, Anningzhe Gao, Xiang Wan. 5524-5538 [doi]
- HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget ReallocationFeng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu. 5539-5555 [doi]
- SEPS: A Separability Measure for Robust Unlearning in LLMsWonje Jeung, Sangyeon Yoon, Albert No. 5556-5587 [doi]
- TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation DetectionZehong Yan, Peng Qi 0005, Wynne Hsu, Mong-Li Lee. 5588-5604 [doi]
- Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical ReasoningJustin Xu, Yiming Li, Zizheng Zhang, Augustine Yui Hei Luk, Mayank Jobanputra, Samarth Oza, Ashley Murray, Meghana Reddy Kasula, Andrew Parker, David W. Eyre. 5605-5622 [doi]
- UnitCoder: Scalable Code Synthesis from Pre-training CorporaYichuan Ma, Yunfan Shao, Peiji Li, Demin Song, Qipeng Guo, Linyang Li, Xipeng Qiu, Kai Chen 0026. 5623-5641 [doi]
- GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language ModelsJixiao Zhang, Chunsheng Zuo. 5642-5654 [doi]
- Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label ExplanationsPeichao Lai, Jiaxin Gan, Feiyang Ye 0002, Wentao Zhang 0001, Fangcheng Fu, Yilei Wang, Bin Cui 0001. 5655-5674 [doi]
- Rethinking Cross-Subject Data Splitting for Brain-to-Text DecodingCongchi Yin, Qian Yu 0003, Zhiwei Fang, Changping Peng, Piji Li. 5675-5689 [doi]
- RCScore: Quantifying Response Consistency in Large Language ModelsDongJun Jang, Youngchae Ahn, Hyopil Shin. 5690-5708 [doi]
- A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation DetectionHui Li, Ante Wang, Kunquan Li, ZhiHao Wang, Liang Zhang, Delai Qiu, Qingsong Liu, Jinsong Su. 5709-5725 [doi]
- OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial DomainShuting Wang 0002, Jiejun Tan, Zhicheng Dou, Ji-Rong Wen. 5726-5751 [doi]
- AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMsXiaopeng Ke, Hexuan Deng, Xuebo Liu 0002, Jun Rao, Zhenxi Song, Jun Yu 0002, Min Zhang 0005. 5752-5785 [doi]
- MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional ThresholdsJunxi Wu, Jinpeng Wang 0002, Zheng Liu, Bin Chen 0011, Dongjian Hu, Hao Wu, Shu-Tao Xia. 5786-5805 [doi]
- Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model MergingLin Lu 0003, Zhigang Zuo, Ziji Sheng, Pan Zhou 0001. 5806-5825 [doi]
- Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic LanguageXi Chen, Shuo Wang. 5826-5841 [doi]
- Beyond Demonstrations: Dynamic Vector Construction from Latent RepresentationsWang Cai, Hsiu-Yuan Huang, Zhixiang Wang, Yunfang Wu. 5842-5857 [doi]
- Detoxifying Large Language Models via the Diversity of Toxic SamplesYing Zhao, Yuanzhao Guo, Xuemeng Weng, Yuan Tian, Wei Wang, Yi Chang. 5858-5871 [doi]
- LLM-Driven Implicit Target Augmentation and Fine-Grained Contextual Modeling for Zero-Shot and Few-Shot Stance DetectionYanxu Ji, Jinzhong Ning, Yi-Jia Zhang 0001, Zhi Liu 0012, Hongfei Lin. 5872-5884 [doi]
- Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service DialoguesMengZe Hong, Wailing Ng, Chen Jason Zhang, Yuanfeng Song, Di Jiang 0004. 5885-5900 [doi]
- Superficial Self-Improved Reasoners Benefit from Model MergingXiangchi Yuan, Chunhui Zhang, Zheyuan Liu 0010, Dachuan Shi, Leyan Pan, Soroush Vosoughi, Wenke Lee. 5901-5921 [doi]
- CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-TuningWenqiao Zhu, Ji Liu 0003, Rongjunchen Zhang, Haipang Wu, Yulun Zhang 0001. 5922-5937 [doi]
- QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain EvaluationMengZe Hong, Wailing Ng, Chen Jason Zhang, Di Jiang 0004. 5938-5953 [doi]
- VideoEraser: Concept Erasure in Text-to-Video Diffusion ModelsNaen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou 0001, Qingming Li, Tianyu Du, Shouling Ji. 5954-5983 [doi]
- Diagram-Driven Course Questions GenerationXinyu Zhang 0021, Lingling Zhang 0005, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Basura Fernando, Jun Liu 0036. 5984-5999 [doi]
- ECC: An Emotion-Cause Conversation Dataset for Empathy ResponseYuanyuan He, Yongsen Pan, Wei Li, Jiali You 0002, Jiawen Deng, Fuji Ren. 6000-6017 [doi]
- ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing RepresentationsZijian Wang, Chang Xu. 6018-6039 [doi]
- JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema SamplingJinwang Song, Hongying Zan, Kunli Zhang, Lingling Mu, Yingjie Han, Haobo Hua, Min Peng. 6040-6053 [doi]
- DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain TranslationZhibo Man, Yuanmeng Chen, Yujie Zhang, Jinan Xu. 6054-6071 [doi]
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific LiteratureDavid Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh 0001, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan. 6072-6109 [doi]
- MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity RecognitionXinkui Lin, Yuhui Zhang, Yongxiu Xu, Kun Huang, Hongzhang Mu, Yubin Wang, Gaopeng Gou, Li Qian, Li Peng, Wei Liu, Jian Luan 0001, Hongbo Xu. 6110-6130 [doi]
- VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language ModelsBingrui Sima, Linhua Cong, Wenxuan Wang 0001, Kun He. 6131-6144 [doi]
- Investigating Neurons and Heads in Transformer-based LLMs for Typographical ErrorsKohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura. 6145-6163 [doi]
- LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling ResearchShuo Yan, Ruochen Li, Ziming Luo, Zimu Wang, Daoyang Li, Liqiang Jing, Kaiyu He, Peilin Wu, Juntong Ni, George Michalopoulos, Yue Zhang, Ziyang Zhang, Mian Zhang, Zhiyu Chen 0002, Xinya Du. 6164-6186 [doi]
- RAV: Retrieval-Augmented Voting for Tactile Descriptions Without TrainingJinlin Wang, Yulong Ji, Hongyu Yang. 6187-6194 [doi]
- Static Word Embeddings for Sentence Semantic RepresentationTakashi Wada 0001, Yuki Hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito 0002. 6195-6211 [doi]
- PropRAG: Guiding Retrieval with Beam Search over Proposition PathsJingjin Wang, Jiawei Han. 6212-6227 [doi]
- Rethinking Backdoor Detection Evaluation for Language ModelsJun Yan 0012, Wenjie Jacky Mo, Xiang Ren 0001, Robin Jia. 6228-6239 [doi]
- Glider: Global and Local Instruction-Driven Expert RouterPingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng 0002, Yi-Lin Sung, Mohit Bansal, Tianlong Chen 0001. 6240-6301 [doi]
- CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language ModelsZhengdong Yang, Zhen Wan, Sheng Li 0010, Chao-Han Huck Yang, Chenhui Chu. 6302-6314 [doi]
- Tiny Budgets, Big Gains: Parameter Placement Strategy in Parameter Super-Efficient Fine-TuningJinman Zhao, Xueyan Zhang, Jiaru Li, Jingcheng Niu, Yulan Hu, Erxue Min, Gerald Penn. 6315-6333 [doi]
- Legal Fact Prediction: The Missing Piece in Legal Judgment PredictionJunkai Liu, Yujie Tong, Hui Huang 0021, Bowen Zheng, Yiran Hu, Peicheng Wu, Chuan Xiao 0001, Makoto Onizuka, Muyun Yang, Shuyuan Zheng. 6334-6349 [doi]
- DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language ModelsXu Zhang, Xunjian Yin, Dinghao Jing, Huixuan Zhang, Xinyu Hu 0001, Xiaojun Wan 0001. 6350-6366 [doi]
- Multilingual Prompting for Improving LLM Generation DiversityQihan Wang, Shidong Pan, Tal Linzen, Emily Black. 6367-6389 [doi]
- MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent SimulationsGenglin Liu, Vivian T. Le, Salman Rahman, Elisa Kreiss, Marzyeh Ghassemi, Saadia Gabriel. 6390-6417 [doi]
- Identification of Multiple Logical Interpretations in Counter-ArgumentsWenzhi Wang, Paul Reisert, Shoichi Naito, Naoya Inoue, Machi Shimmei, Surawat Pothong, Jungmin Choi, Kentaro Inui. 6418-6433 [doi]
- LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model EditingPeng Wang 0028, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu 0001. 6434-6459 [doi]
- AlignX: Advancing Multilingual Large Language Models with Multilingual Representation AlignmentMengyu Bu, Shaolei Zhang 0001, Zhongjun He, Hua Wu 0003, Yang Feng 0004. 6460-6489 [doi]
- What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought ReasoningGangwei Jiang, Yahui Liu, Zhaoyi Li, Wei Bi, Fuzheng Zhang, Linqi Song, Ying Wei 0001, Defu Lian. 6490-6514 [doi]
- HD-PiSSA: High-Rank Distributed Orthogonal AdaptationYiding Wang, Fanxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang. 6515-6528 [doi]
- Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMsRunyu Peng, Yunhua Zhou, Kai Lv 0001, Yang Gao 0042, Qipeng Guo, Xipeng Qiu. 6529-6554 [doi]
- SPE Attention: Making Attention Equivariant to Semantic-Preserving Permutation for Code ProcessingChengyu Jiao, Shuhao Chen, Yu Zhang. 6555-6568 [doi]
- Audio-centric Video Understanding Benchmark without Text ShortcutYudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li 0119, Zejun Ma 0001, Chao Zhang 0031. 6569-6587 [doi]
- TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked TextSongshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang. 6588-6601 [doi]
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image ExplorationHaozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin. 6602-6618 [doi]
- Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-ReformulationEnci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu. 6619-6633 [doi]
- VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMsKeer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Bin Cui 0001, Tengjiao Wang, Wentao Zhang 0001. 6634-6658 [doi]
- FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language ModelsHengxing Cai, Jinhan Dong, Jingjun Tan, Jingcheng Deng, Sihang Li 0002, Zhifeng Gao, Haidong Wang, Zicheng Su, Agachai Sumalee, Renxin Zhong. 6659-6676 [doi]
- Multimodal Language Models See Better When They Look ShallowerHaoran Chen, Junyan Lin, Xinghao Chen 0009, Yue Fan, Jianfeng Dong, Xin Jin, Hui Su, JinLan Fu, Xiaoyu Shen 0001. 6677-6695 [doi]
- LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and OptimizationXujia Wang, Yunjia Qi, Bin Xu. 6696-6715 [doi]
- Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM WatermarkingTianle Gu, Zongqi Wang, Kexin Huang, Yuanqi Yao, Xiangliang Zhang 0001, Yujiu Yang 0001, Xiuying Chen. 6716-6733 [doi]
- Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender BiasesBufan Gao, Elisa Kreiss. 6734-6750 [doi]
- Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional VerificationJikai Wang, Zhenxu Tian, Juntao Li 0005, Qingrong Xia, Xinyu Duan, Zhe-Feng Wang 0001, Baoxing Huai, Min Zhang 0005. 6751-6763 [doi]
- ViLBench: A Suite for Vision-Language Process Reward ModelingHaoqin Tu, Weitao Feng, Hardy Chen, Hui Liu 0031, Xianfeng Tang, Cihang Xie. 6764-6779 [doi]
- Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringHwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee. 6780-6800 [doi]
- Route Sparse Autoencoder to Interpret Large Language ModelsWei Shi, Sihang Li 0002, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang 0010, Xiangnan He 0001. 6801-6815 [doi]
- BTS: Harmonizing Specialized Experts into a Generalist LLMQizhen Zhang 0002, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen 0016, Emily Dinan, Suchin Gururangan, Mike Lewis. 6816-6834 [doi]
- CoCoA: Confidence- and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language ModelsAnant Khandelwal, Manish Gupta 0001, Puneet Agrawal. 6835-6855 [doi]
- R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion ModelsHuixuan Zhang, Xiaojun Wan. 6856-6870 [doi]
- Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop LearningZinan Tang 0001, Xin Gao 0001, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu 0003, Conghui He, Lijun Wu 0003. 6871-6891 [doi]
- Information Integration in Large Language Models is Gated by Linguistic Structural MarkersWei Liu, Nai Ding. 6892-6904 [doi]
- Why and How LLMs Benefit from Knowledge Introspection in Commonsense ReasoningChengfeng Zhao, Shizhu He, Shanshan Jiang 0001, Bin Dong 0003, Jun Zhao 0001, Kang Liu 0001. 6905-6920 [doi]
- GraDaSE: Graph-Based Dataset Search with ExamplesJing He, Mingyang Lv, Qing Shi, Gong Cheng. 6921-6932 [doi]
- Confidence-guided Refinement Reasoning for Zero-shot Question AnsweringYouwon Jang, Woo Suk Choi, Minjoon Jung, Min Su Lee, Byoung-Tak Zhang. 6933-6950 [doi]
- DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought CorrectionYiqi Li, Yusheng Liao, Zhe Chen, Yanfeng Wang, Yu Wang. 6951-6966 [doi]
- CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation BackdoorZhenhua Xu 0004, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han. 6967-6989 [doi]
- Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuessYikuan Xia, Jiazun Chen, Sujian Li, Jun Gao. 6990-7007 [doi]
- EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic FingerprintZhenhua Xu 0004, Meng Han, Wenpeng Xing. 7008-7031 [doi]
- Selective Preference Optimization via Token-Level Reward Function EstimationKailai Yang, Zhiwei Liu 0003, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou. 7032-7056 [doi]
- Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct ComparisonsSeonil Son, Ju-Min Oh, Heegon Jin, Cheolhun Jang, Jeongbeom Jeong, Kuntae Kim. 7057-7075 [doi]
- Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language ModelsRuiyi Yan, Yugo Murawaki. 7076-7098 [doi]
- ExeCoder: Empowering Large Language Models with Executability Representation for Code TranslationMinghua He, Yue Chen 0014, Fangkai Yang, Pu Zhao 0004, Wenjie Yin, Yu Kang 0006, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001. 7099-7125 [doi]
- TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question AnsweringJunnan Zhu, Jingyi Wang, Bohan Yu, Xiaoyu Wu, Junbo Li, Lei Wang, Nan Xu. 7126-7146 [doi]
- NOVA-63: Native Omni-lingual Versatile Assessments of 63 DisciplinesJinyang Zhang, Kexin Yang 0002, Yu Wan 0004, Muyang Ye, Baosong Yang, Fei Huang, Junyang Lin, Dayiheng Liu. 7147-7189 [doi]
- InfoGain-RAG: Boosting Retrieval-Augmented Generation through Document Information Gain-based Reranking and FilteringZihan Wang, Zihan Liang 0001, Zhou Shao, Yufei Ma 0011, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li 0005. 7190-7204 [doi]
- SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token PruningYicheng Ji, Jun Zhang 0069, Heming Xia, Jinpeng Chen 0001, Lidan Shou, Gang Chen 0001, Huan Li 0003. 7205-7219 [doi]
- What Do Indonesians Really Need from Language Technology? A Nationwide SurveyMuhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto. 7220-7245 [doi]
- LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal ExpertsYimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki 0001. 7246-7261 [doi]
- Confounding Factors in Relating Model Performance to MorphologyWessel Poelman, Thomas Bauwens, Miryam de Lhoneux. 7262-7287 [doi]
- Context-Aware Membership Inference Attacks against Pre-trained Large Language ModelsHongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri. 7288-7310 [doi]
- Formalizing Style in Personal NarrativesGustave Cortal, Alain Finkel. 7311-7326 [doi]
- TopicAttack: An Indirect Prompt Injection Attack via Topic TransitionYulin Chen, Haoran Li 0003, Yuexin Li, Yue Liu 0008, Yangqiu Song, Bryan Hooi. 7327-7345 [doi]
- PSET: a Phonetics-Semantics Evaluation TestbedGianluca Sperduti, Dong Nguyen. 7346-7356 [doi]
- From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel CorporaYingli Shen, Wen Lai, Shuo Wang 0013, Ge Gao, Kangyang Luo, Alexander Fraser 0001, Maosong Sun 0001. 7357-7379 [doi]
- GATEAU: Selecting Influential Samples for Long Context AlignmentShuzheng Si, Haozhe Zhao, Gang Chen 0039, Yunshui Li, Kangyang Luo, Chuancheng Lv, Kaikai An, Fanchao Qi, Baobao Chang, Maosong Sun 0001. 7380-7411 [doi]
- Teach Small Models to Reason by Curriculum DistillationWangyi Jiang, Yaojie Lu 0001, Hongyu Lin, Xianpei Han, Le Sun 0001. 7412-7422 [doi]
- Enhancing Reasoning Abilities of Small LLMs with Cognitive AlignmentWenrui Cai 0001, Chengyu Wang 0001, Junbing Yan, Jun Huang 0007, Xiangzhong Fang. 7423-7438 [doi]
- NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement LearningWei Liu 0161, Siya Qi, Xinyu Wang, Chen Qian, Yali Du 0001, Yulan He 0001. 7439-7458 [doi]
- Genre Matters: How Text Types Interact with Decoding Strategies and Lexical Predictors in Shaping Reading BehaviorLena Sophia Bolliger, Lena Ann Jäger. 7459-7476 [doi]
- RTE-GMoE: A Model-agnostic Approach for Relation Triplet Extraction via Graph-based Mixture-of-Expert Mutual LearningAziguli Wulamu, Kaiyuan Gong, Lyu Zhengyu, Yu Han, Zhihong Zhu, Bowen Xing. 7477-7488 [doi]
- Avoidance Decoding for Diverse Multi-Branch Story GenerationKyeongman Park, Nakyeong Yang, Kyomin Jung. 7489-7505 [doi]
- Probabilistic Soundness Guarantees in LLM Reasoning ChainsWeiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong 0001. 7506-7525 [doi]
- SQLWOZ: A Realistic Task-Oriented Dialogue Dataset with SQL-Based Dialogue State Representation for Complex User RequirementsHeng-Da Xu, Xian-Ling Mao, Fanshu Sun, Tian-Yi Che, Cheng-Xin Xin, Heyan Huang. 7526-7551 [doi]
- SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language ModelsYuxin Gou, Xiaoning Dong, Qin Li, Shishen Gu, Richang Hong, Wenbo Hu 0001. 7552-7593 [doi]
- EMO: Embedding Model Distillation via Intra-Model Relation and Optimal Transport AlignmentsMinh-Phuc Truong, Hai An Vu, Tu Vu, Nguyen Thi Ngoc Diep, Linh Van Ngo 0001, Thien Huu Nguyen, Trung Le 0001. 7594-7606 [doi]
- AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic AssessmentKun Li 0015, Lai-Man Po, Hongzheng Yang, Xuyuan Xu, Kangcheng Liu, Yuzhi Zhao. 7607-7620 [doi]
- DA-Pred: Performance Prediction for Text Summarization under Domain-Shift and Instruct-TuningAnum Afzal, Florian Matthes, Alexander R. Fabbri. 7621-7632 [doi]
- UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NERJielong Tang, Yang Yang, Jianxing Yu, Zhen-Xing Wang, Haoyuan Liang, Liang Yao, Jian Yin. 7633-7651 [doi]
- An Empirical Study of LLM Reasoning Ability Under Strict Output Length ConstraintYi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Yizhen Yuan, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu. 7652-7671 [doi]
- Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM EnrichingSongze Li, Zhiqiang Liu, Zhengke Gui, Huajun Chen, Wen Zhang. 7672-7692 [doi]
- Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-MakingYuanjun Feng, Vivek Choudhary, Yash Raj Shrestha. 7693-7706 [doi]
- Structuring Radiology Reports: Challenging LLMs with Lightweight ModelsJohannes Moll, Louisa Fay, Asfandyar Azhar, Sophie Ostmeier, Sergios Gatidis, Tim C. Lueth, Curtis Langlotz, Jean-Benoit Delbrouck. 7707-7724 [doi]
- PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing TasksYunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen 0001, Dietrich Klakow, Wei Zhang 0185, Xiaoyu Shen 0001. 7725-7734 [doi]
- EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model InferenceYuebin Xu, Zhiyi Chen, Zeyi Wen. 7735-7745 [doi]
- Investigating Value-Reasoning Reliability in Small Large Language ModelsXia Du, Shuhan Sun, Pengyuan Liu, Dong Yu. 7746-7786 [doi]
- Can LLMs Explain Themselves Counterfactually?Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar. 7787-7815 [doi]
- Self-Adjust SoftmaxChuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Zhenguo Li, Yu Li 0006. 7816-7836 [doi]
- DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph RefinementShaoqing Lin, Chong Teng, Fei Li 0021, Donghong Ji, Lizhen Qu, Zhuang Li 0001. 7837-7862 [doi]
- XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoMLErnesto Luis Estevanell-Valladares, Suilan Estevez-Velarde, Yoan Gutiérrez, Andrés Montoyo, Ruslan Mitkov. 7863-7880 [doi]
- UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language ModelsRoman Vashurin, Maiya Goloburda, Preslav Nakov, Maxim Panov. 7881-7908 [doi]
- WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningZhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang 0014, Bing Yin, Hyokun Yun, Lihong Li 0001. 7909-7928 [doi]
- Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language ModelsTobias Domhan, Dawei Zhu. 7929-7947 [doi]
- PAKTON: A Multi-Agent Framework for Question Answering in Long Legal AgreementsPetros Raptopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou. 7948-7984 [doi]
- PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational SummarizationXu Sun, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina. 7985-8009 [doi]
- ConCISE: Confidence-guided Compression in Step-by-step Efficient ReasoningZiqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Guanbo Wang, Fandong Meng, Jie Zhou 0016, Ju Ren 0001, Yaoxue Zhang. 8010-8029 [doi]
- Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety AlignmentHao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha. 8030-8050 [doi]
- Cross-domain Rumor Detection via Test-Time Adaptation and Large Language ModelsYuxia Gong, Shuguo Hu, Huaiwen Zhang. 8051-8066 [doi]
- MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight QuantizationChun Hu, Junhui He, Shangyu Wu, Yuxin He, Chun Jason Xue, Qingan Li. 8067-8077 [doi]
- ToDi: Token-wise Distillation via Fine-Grained Divergence ControlSeongryong Jung, Suwan Yoon, Donggeon Kim, Hwanhee Lee. 8078-8091 [doi]
- RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code GenerationQingyao Li, Wei Xia 0001, Xinyi Dai, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu 0001, Weinan Zhang 0001. 8092-8110 [doi]
- Probing for Arithmetic Errors in Language ModelsYucheng Sun, Alessandro Stolfo, Mrinmaya Sachan. 8111-8128 [doi]
- NILE: Internal Consistency Alignment in Large Language ModelsMinda Hu, Qiyuan Zhang 0001, Yufei Wang 0005, Bowei He, Hongru Wang 0003, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma 0001, Irwin King. 8129-8147 [doi]
- Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event ForecastingRong Ma, Lei Wang 0065, Yating Yang, Bo Ma 0004, Rui Dong 0002, Fengyi Yang, Ahtamjan Ahmat, Kaiwen Lu, Xinyue Wang. 8148-8163 [doi]
- RAGferee: Building Contextual Reward Models for Retrieval-Augmented GenerationAndrei Catalin Coman, Ionut-Teodor Sorodoc, Leonardo F. R. Ribeiro, Bill Byrne, James Henderson, Adrià de Gispert. 8164-8211 [doi]
- Large Language Models Discriminate Against Speakers of German DialectsMinh Duc Bui, Carolin Holtermann, Valentin Hofmann, Anne Lauscher, Katharina von der Wense. 8212-8240 [doi]
- Uncovering Argumentative Flow: A Question-Focus Discourse Structuring FrameworkYini Wang, Xian Zhou, Shengan Zheng, Linpeng Huang, Zhunchen Luo, Wei Luo, Xiaoying Bai. 8241-8259 [doi]
- AbsVis - Benchmarking How Humans and Vision-Language Models "See" Abstract Concepts in ImagesTarun Tater, Diego Frassinelli, Sabine Schulte im Walde. 8260-8281 [doi]
- A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource LanguagesTatiana Anikina, Ján Cegin, Jakub Simko, Simon Ostermann 0002. 8282-8303 [doi]
- Alignment with Fill-In-the-Middle for Enhancing Code GenerationHouxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang 0036, Aojun Zhou, Junting Pan, Mingjie Zhan, Hongsheng Li 0001. 8304-8320 [doi]
- A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model ConfidentialityHanbo Huang, Yihan Li, Bowen Jiang, Bo Jiang, Lin Liu 0018, Zhuotao Liu, Ruoyu Sun 0001, Shiyu Liang. 8321-8359 [doi]
- Variance Sensitivity Induces Attention Entropy Collapse and Instability in TransformersJonghyun Hong, Sungyoon Lee. 8360-8378 [doi]
- X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQAMin-Hyuk Kim, Changheon Kim, Seok Bong Yoo. 8379-8397 [doi]
- Robust Native Language Identification through Agentic DecompositionAhmet Yavuz Uluslu, Tannon Kew, Tilia Ellendorff, Gerold Schneider, Rico Sennrich. 8398-8414 [doi]
- ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from ScratchJiawei Chen 0011, Xinyan Guan, Qianhao Yuan, Guozhao Mo, Weixiang Zhou, Yaojie Lu 0001, Hongyu Lin, Ben He 0001, Le Sun 0001, Xianpei Han. 8415-8441 [doi]
- Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical StudyYizheng Sun, Hao Li 0074, Chang Xu 0008, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun. 8442-8456 [doi]
- When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and AmbiguityNisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy. 8457-8480 [doi]
- Self-Critique and Refinement for Faithful Natural Language ExplanationsYingming Wang, Pepa Atanasova. 8481-8507 [doi]
- The Psychology of Falsehood: A Human-Centric Survey of Misinformation DetectionArghodeep Nandi, Megha Sundriyal, Euna Mehnaz Khan, Jikai Sun, Emily K. Vraga, Jaideep Srivastava, Tanmoy Chakraborty 0002. 8508-8525 [doi]
- SEAL: Structure and Element Aware Learning Improves Long Structured Document RetrievalXinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen. 8526-8536 [doi]
- AnchorAttention: Difference-Aware Sparse Attention with Stripe GranularityYu Zhang, Dong Guo, Fang Wu, Guoliang Zhu, Dian Ding, Yiming Zhang. 8537-8549 [doi]
- Attacks by Content: Automated Fact-checking is an AI Security IssueMichael Sejr Schlichtkrull. 8550-8565 [doi]
- MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language ModelsYuezhang Peng, Yuxin Liu, Fei Wen, Xie Chen 0001. 8566-8584 [doi]
- Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text DetectorsHao Fang 0011, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen 0011, Shu-Tao Xia, Yaowei Wang 0001, Min Zhang 0005. 8585-8602 [doi]
- Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QASergey Pletenev, Maria Marina, Nikolay Ivanov, Daria Galimzianova, Nikita Krayko, Mikhail Salnikov, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii. 8603-8620 [doi]
- Steering Language Models in Multi-Token Generation: A Case Study on Tense and AspectAlina Klerings, Jannik Brinkmann, Daniel Ruffinelli, Simone Paolo Ponzetto. 8621-8639 [doi]
- DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG RerankersNavve Wasserman, Oliver Heinimann, Yuval Golbari, Tal Zimbalist, Eli Schwartz, Michal Irani. 8640-8658 [doi]
- Reason to Rote: Rethinking Memorization in ReasoningYupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank. 8659-8679 [doi]
- VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image CaptionsKazuki Matsuda, Yuiga Wada, Shinnosuke Hirano, Seitaro Otsuki, Komei Sugiura. 8680-8696 [doi]
- LLM-Independent Adaptive RAG: Let the Question Speak for ItselfMaria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii. 8697-8709 [doi]
- TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse RouteHongyi Luo, Qing Cheng 0001, Daniel Matos, Hari Krishna Gadi, Yanfeng Zhang 0004, Lu Liu, Yongliang Wang, Niclas Zeller, Daniel Cremers, Liqiu Meng. 8710-8729 [doi]
- Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical GuaranteesYuqicheng Zhu, Jingcheng Wu, Yizhen Wang, Hongkuan Zhou, Jiaoyan Chen 0001, Evgeny Kharlamov, Steffen Staab. 8730-8752 [doi]
- Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form GenerationShengxiang Gao, Jey Han Lau, Jianzhong Qi 0001. 8753-8772 [doi]
- A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit InterpolationYan Li, Tianyi Zhang, Zechuan Li, Caren Han. 8773-8793 [doi]
- Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn GuidanceYilun Liu 0001, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Justin Li, Jian Gao, Zhang Li, Hao Yang 0006, Boxing Chen, Osamu Yoshie. 8794-8811 [doi]
- We Need to Measure Data Diversity in NLP - Better and BroaderDong Nguyen 0002, Esther Ploeger. 8812-8821 [doi]
- Sheaf Discovery with Joint Computation Graph Pruning and Flexible GranularityLei Yu, Jingcheng Niu, Zining Zhu 0001, Xi Chen, Gerald Penn. 8822-8837 [doi]
- Hierarchical Bracketing Encodings Work for Dependency GraphsAna Ezquerro, Carlos Gómez-Rodríguez, David Vilares 0001. 8838-8851 [doi]
- Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech SynthesisZhenqi Jia, Rui Liu 0008, Berrak Sisman, Haizhou Li 0001. 8852-8858 [doi]
- Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language ModelsMehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm 0001, Kristian Kersting. 8859-8898 [doi]
- Conditional [MASK] Discrete Diffusion Language ModelHyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin Jung. 8899-8923 [doi]
- Language-Guided Temporal Token Pruning for Efficient VideoLLM ProcessingYogesh Kumar. 8924-8931 [doi]
- A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and OptimizationAnda Cheng, Wei Huang 0039, Yinggui Wang. 8932-8943 [doi]
- IIET: Efficient Numerical Transformer via Implicit Iterative Euler MethodXinyu Liu, Bei Li, Jiahao Liu, Junhao Ruan, Kechen Jiao, Hongyin Tang, Jingang Wang, Tong Xiao, Jingbo Zhu. 8944-8958 [doi]
- WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World ModelTianqing Fang, Hongming Zhang 0009, Zhisong Zhang, Kaixin Ma, Wenhao Yu 0002, Haitao Mi, Dong Yu 0001. 8959-8975 [doi]
- Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy GuaranteesStephen Meisenbacher, Maulik Chevli, Florian Matthes. 8976-8992 [doi]
- HVGuard: Utilizing Multimodal Large Language Models for Hateful Video DetectionYiheng Jing, Mingming Zhang, Yong Zhuang, Jiacheng Guo, Juan Wang 0006, Xiaoyang Xu 0001, Wenzhe Yi, Keyan Guo, Hongxin Hu. 8993-9006 [doi]
- Accelerate Parallelizable Reasoning via Parallel Decoding within One SequenceYijiong Yu, Wei Wang, Ran Chen, Ji Pei. 9007-9014 [doi]
- SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from DesignWenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li 0006, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu. 9015-9039 [doi]
- LLM-OREF: An Open Relation Extraction Framework Based on Large Language ModelsHongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang, Jinsong Su. 9040-9052 [doi]
- Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference OptimizationJian Li, Shenglin Yin, Yujia Zhang, Alan Zhao, Xi Chen, Xiaohui Zhou, Pengfei Xu. 9053-9063 [doi]
- Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning ArgumentationsLeonardo Ranaldi, Federico Ranaldi, Fabio Massimo Zanzotto, Barry Haddow, Alexandra Birch. 9064-9085 [doi]
- Predicate-Guided Generation for Mathematical ReasoningJiajun Chen, Yik-Cheung Tam. 9086-9099 [doi]
- ComplexTempQA: A 100m Dataset for Complex Temporal Question AnsweringRaphael Gruber, Abdelrahman Abdallah, Michael Färber 0001, Adam Jatowt. 9100-9112 [doi]
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning AgentsQiuchen WANG, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao. 9113-9134 [doi]
- IndoSafety: Culturally Grounded Safety for LLMs in Indonesian LanguagesMuhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto. 9135-9166 [doi]
- Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise EnvironmentsHarsh Vishwakarma, Ankush Agarwal, Ojas Patil, Chaitanya Devaguptapu, Mahesh Chandran. 9167-9201 [doi]
- Steering LLM Reasoning Through Bias-Only AdaptationViacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov. 9202-9211 [doi]
- VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision MakingZuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan 0001, Bin Liu. 9212-9232 [doi]
- M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning FrameworkYew Ken Chia, LiYing Cheng, Hou Pong Chan, Maojia Song, Chaoqun Liu, Mahani Aljunied, Soujanya Poria, Lidong Bing. 9233-9250 [doi]
- Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language ModelsPu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang. 9251-9270 [doi]
- FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human FeedbackYouquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin Cui 0001, Weipeng Chen, Zenan Zhou, Wentao Zhang 0001. 9271-9291 [doi]
- HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text ClassificationFabian Karl 0001, Ansgar Scherp. 9292-9303 [doi]
- CARD: Cross-modal Agent Framework for Generative and Editable Residential DesignPengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, Zhanxiang Jin, Shuai Lu. 9304-9319 [doi]
- DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-offJusheng Zhang, Yijia Fan, Kaitong Cai, Zimeng Huang, Xiaofei Sun, Jian Wang 0100, Chengpei Tang, Keze Wang. 9320-9340 [doi]
- FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited DataThibaut Thonet, Germán Kruszewski, Jos Rozen, Pierre Erbacher, Marc Dymetman. 9341-9370 [doi]
- On LLM-Based Scientific Inductive Reasoning Beyond EquationsBrian S. Lin, Jiaxin Yuan, Zihan Zhou, Shouli Wang, Shuo Wang 0013, Cunliang Kong, Qi Shi 0002, Yuxuan Li, Liner Yang, Zhiyuan Liu 0001, Maosong Sun 0001. 9371-9394 [doi]
- SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption EvaluationXiaofu Chen, Israfel Salazar, Yova Kementchedjhieva. 9395-9407 [doi]
- LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical EncodingYuxuan Hu, Jihao Liu, Ke Wang 0036, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou 0001, Rui Liu 0019, Aojun Zhou, Hongsheng Li 0001. 9408-9421 [doi]
- Does quantization affect models' performance on long-context tasks?Anmol Mekala, Anirudh Atmakuru, Yixiao Song, Marzena Karpinska, Mohit Iyyer. 9422-9470 [doi]
- Token-Aware Editing of Internal Activations for Large Language Model AlignmentTianbo Wang, Yuqing Ma, Kewei Liao, Chengzhao Yang, Zhange Zhang, Jiakai Wang, Xianglong Liu 0001. 9471-9509 [doi]
- Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMsDawid Jan Kopiczko, Tijmen Blankevoort, Yuki M. Asano. 9510-9536 [doi]
- Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A SurveyMd. Mehrab Tanjim, Yeonjun In, Xiang Chen 0010, Victor S. Bursztyn, Ryan A. Rossi, SungChul Kim, Guang-Jie Ren, Vaishnavi Muppala, Shun Jiang, YongSung Kim, Chanyoung Park 0001. 9537-9550 [doi]
- Plan Dynamically, Express Rhetorically: A Debate-Driven Rhetorical Framework for Argumentative WritingXueguan Zhao, Wenpeng Lu, Chaoqun Zheng, Weiyu Zhang 0001, Jiasheng Si, Deyu Zhou. 9551-9573 [doi]
- TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-makingKechen Jiao, Zhirui Fang, Jiahao Liu, Bei Li, Qifan Wang, Xinyu Liu, Junhao Ruan, Zhongjian Qiao, Yifan Zhu, Yaxin Xu, Jingang Wang, Xiu Li 0001. 9574-9588 [doi]
- Reimagining Safety Alignment with An ImageYifan Xia, Guorui Chen, Wenqian Yu, Zhijiang Li, Philip Torr 0001, Jindong Gu. 9589-9603 [doi]
- Generative or Discriminative? Revisiting Text Classification in the Era of TransformersSiva Rajesh Kasa, Karan Gupta 0002, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, Santhosh Kumar Kasa, Nikhil Priyatam Pattisapu, Arindam Bhattacharya, Shailendra Agarwal, Vijay Huddar. 9604-9626 [doi]
- Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context InjectionZiqi Miao, Yi Ding, Lijun Li, Jing Shao. 9627-9644 [doi]
- Can Large Language Models Win the International Mathematical Games?Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Lorenzo Tordi, Antonella Carbonaro, Gianluca Moro. 9645-9671 [doi]
- CodeArena: Evaluating and Aligning CodeLLMs on Human PreferenceJian Yang 0003, Jiaxi Yang 0004, Wei Zhang 0021, Jin Ke, Yibo Miao, Lei Zhang 0201, Liqun Yang, Zeyu Cui, Yichang Zhang, Zhoujun Li 0001, Binyuan Hui, Junyang Lin. 9672-9683 [doi]
- Language models can learn implicit multi-hop reasoning, but only if they have lots of training dataYuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn 0001, Alexander Koller. 9684-9702 [doi]
- UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency AssessmentJoseph Marvin Imperial, Abdullah Barayan, Regina Stodden, Rodrigo Wilkens, Ricardo Muñoz Sánchez, Lingyun Gao, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Reynolds 0001, Eugénio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas François, Fernando Alva-Manchego, Harish Tayyar Madabushi. 9703-9755 [doi]
- CROP: Contextual Region-Oriented Visual Token PruningJiawei Guo, Feifei Zhai, Pu Jian, Qianrun Wei, Yu Zhou. 9756-9772 [doi]
- CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen ScienceAndrew Piper, Robert Budac. 9773-9784 [doi]
- XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer CompressionHaoqi Yang 0001, Yao Yao, Zuchao Li, Baoyuan Qi, Liu Guoming, Hai Zhao 0001. 9785-9800 [doi]
- DINT TransformerYueyang Cang, Yuhang Liu, Xiaoteng Zhang, Erlu Zhao, Li Shi. 9801-9809 [doi]
- ICR: Iterative Clarification and Rewriting for Conversational SearchZhiyu Cao, Peifeng Li, Qiaoming Zhu. 9810-9824 [doi]
- Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and AlignmentTong Zhang, Kuofeng Gao, Jiawang Bai, Leo Yu Zhang, Xin Yin, Zonghui Wang, Shouling Ji, Wenzhi Chen. 9825-9838 [doi]
- Similarity = Value? Consultation Value-Assessment and Alignment for Personalized SearchWeicong Qin, Yi Xu 0003, Weijie Yu 0003, Teng Shi, Chenglei Shen, Ming He, Jianping Fan 0001, Xiao Zhang 0034, Jun Xu 0001. 9839-9852 [doi]
- RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language ModelsZhaoyan Gong, Juan Li 0010, Zhiqiang Liu, Lei Liang 0002, Huajun Chen, Wen Zhang 0015. 9853-9870 [doi]
- Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning PerformanceYao Wang, Di Liang, Minlong Peng. 9871-9885 [doi]
- AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEOXiaonan Wang, Bo Shao, Hansaem Kim. 9886-9903 [doi]
- CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language ModelsKairong Han, Wenshuo Zhao, Ziyu Zhao 0001, Ye Jun Jian, Lujia Pan, Kun Kuang. 9904-9921 [doi]
- Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution ConsistencyZhaoheng Huang, Yutao Zhu 0001, Ji-Rong Wen, Zhicheng Dou. 9922-9934 [doi]
- Measuring Chain of Thought Faithfulness by Unlearning Reasoning StepsMartin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov. 9935-9960 [doi]
- Stop Looking for "Important Tokens" in Multimodal Language Models: Duplication Matters MoreZichen Wen, Yifeng Gao 0003, Shaobo Wang 0001, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang 0001. 9961-9980 [doi]
- AgentPro: Enhancing LLM Agents with Automated Process SupervisionYuchen Deng, Shichen Fan, Naibo Wang, Xinkui Zhao, See-Kiong Ng. 9981-10006 [doi]
- PORTS: Preference-Optimized Retrievers for Tool Selection with Large Language ModelsLorenzo Molfetta, Giacomo Frisoni, Nicolò Monaldini, Gianluca Moro. 10007-10030 [doi]
- MusKGC: A Flexible Multi-source Knowledge Enhancement Framework for Open-World Knowledge Graph CompletionXin Song, Liu Haiyan, Haiyang Wang, Ye Wang 0015, Kai Chen, Bin Zhou 0004. 10031-10049 [doi]
- Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its ApplicationsKai Tang, Rui Wang 0076, Renyu Zhu, Minmin Lin, Xiao Ding, Tangjie Lv, Changjie Fan, Runze Wu 0001, Haobo Wang 0001. 10050-10066 [doi]
- Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language ModelsHao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari. 10067-10079 [doi]
- Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic EvaluationSimin Chen, Yiming Chen 0010, Zexin Li 0001, Yifan Jiang, Zhongwei Wan, Yixin He 0002, Dezhi Ran, Tianle Gu, Haizhou Li 0001, Tao Xie 0001, Baishakhi Ray. 10080-10098 [doi]
- FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance DomainTiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao 0001, Arman Cohan, Chen Zhao 0013. 10099-10128 [doi]
- RecGPT: A Foundation Model for Sequential RecommendationYangqin Jiang, Xubin Ren, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang 0001. 10129-10143 [doi]
- Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive SurveyChih-Kai Yang, Neo S. Ho, Hung-yi Lee. 10144-10170 [doi]
- Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and AccuracyNikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koriagin, Daniil Gavrilov. 10171-10179 [doi]
- Learn and Unlearn: Addressing Misinformation in Multilingual LLMsTaiming Lu, Philipp Koehn. 10180-10195 [doi]
- PRISM: Efficient Long-Range Reasoning With Short-Context LLMsDulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel. 10196-10218 [doi]
- Augmenting Multi-Agent Communication with State Delta TrajectoryYichen Tang 0001, Weihang Su, Yujia Zhou 0002, Yiqun Liu 0001, Min Zhang 0006, Shaoping Ma, Qingyao Ai. 10219-10240 [doi]
- SAEs Are Good for Steering - If You Select the Right FeaturesDana Arad, Aaron Mueller, Yonatan Belinkov. 10241-10259 [doi]
- CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic TriplesKyohoon Jin, Juhwan Choi, Jungmin Yun, Junho Lee, Soojin Jang, Youngbin Kim. 10260-10278 [doi]
- Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer LayersMilad Alshomary, Nikhil Reddy Varimalla, Vishal Anand 0002, Smaranda Muresan, Kathleen McKeown. 10279-10292 [doi]
- When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language ModelsYingming Zheng, Hanqi Li, Kai Yu, Lu Chen. 10293-10308 [doi]
- A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez ScriptHellina Hailu Nigatu, Atnafu Lambebo Tonja, Henok Biadglign Ademtew, Hizkiel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam. 10309-10320 [doi]
- Evaluating Language Translation Models by Playing TelephoneSyeda Jannatus Saba, Steven Skiena. 10321-10336 [doi]
- Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency GraphsShuo Yang, Zheyu Zhang 0007, Bardh Prenkaj, Gjergji Kasneci. 10337-10358 [doi]
- SPaRC: A Spatial Pathfinding Reasoning ChallengeLars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp. 10359-10390 [doi]
- Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM TrainingYao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, Wen-Kwang Tsao. 10391-10413 [doi]
- Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense FrameworkYuhang Chen, Zhen Tan 0001, Ajay Kumar Jaiswal, Huaizhi Qu, Xinyu Zhao, Qi Lin, Yu Cheng 0001, Andrew Kwong, Zhichao Cao 0002, Tianlong Chen 0001. 10414-10424 [doi]
- Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language ModelsWei Jie Yeo, Ranjan Satapathy, Erik Cambria. 10425-10447 [doi]
- Calibrating LLM Confidence by Probing Perturbed Representation StabilityReza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese Smiley, Kundan Thind, Mohammad M. Ghassemi. 10448-10514 [doi]
- SATER: A Self-Aware and Token-Efficient Approach to Routing and CascadingYuanzhe Shen, Yide Liu, Zisu Huang, Ruicheng Yin, Xiaoqing Zheng, Xuanjing Huang 0001. 10515-10529 [doi]
- DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language ModelsRui Ha, Chaozhuo Li, Rui Pu, Litian Zhang, Xi Zhang 0008, Sen Su. 10530-10544 [doi]
- CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLMJuntae Lee, Jihwan Bang, Seunghan Yang, Simyung Chang. 10545-10559 [doi]
- On the Role of Model Prior in Real-World Inductive ReasoningZhuo Liu, Ding Yu, Hangfeng He. 10560-10583 [doi]
- Viability of Machine Translation for Healthcare in Low-Resourced LanguagesHellina Hailu Nigatu, Nikita Mehandru, Negasi Haile Abadi, Blen Gebremeskel, Ahmed Alaa, Monojit Choudhury. 10584-10598 [doi]
- Latent Inter-User Difference Modeling for LLM PersonalizationYilun Qiu, Tianhao Shi, Xiaoyan Zhao 0005, Fengbin Zhu, Yang Zhang 0072, Fuli Feng. 10599-10617 [doi]
- IG-Pruning: Input-Guided Block Pruning for Large Language ModelsKangyu Qiao, Shaolei Zhang 0001, Yang Feng 0004. 10618-10629 [doi]
- Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?Momoka Furuhashi, Kouta Nakayama, Takashi Kodama, Saku Sugawara. 10630-10653 [doi]
- Measuring the Effect of Disfluency in Multilingual Knowledge Probing BenchmarksKirill Semenov, Rico Sennrich. 10654-10672 [doi]
- Knowledge Editing through Chain-of-ThoughtChangyue Wang, Weihang Su, Qingyao Ai, Yichen Tang 0001, Yiqun Liu 0001. 10673-10693 [doi]
- SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code GenerationQian Dong, Jia Chen 0003, Qingyao Ai, Hongning Wang, Haitao Li 0006, Yi Wu, Yao Hu 0002, Yiqun Liu 0001, Shaoping Ma. 10694-10705 [doi]
- Probing Logical Reasoning of MLLMs in Scientific DiagramsYufei Wang, Adriana Kovashka. 10706-10718 [doi]
- AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-trainingHuishuai Zhang, Bohan Wang, Luoxin Chen. 10719-10738 [doi]
- Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and PitfallsFeiyang Kang, Newsha Ardalani, Michael Kuchnik, Youssef Emad, Mostafa Elhoushi, Shubhabrata Sengupta, Shang-wen Li 0001, Ramya Raghavendra, Ruoxi Jia 0001, Carole-Jean Wu. 10739-10758 [doi]
- Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question AnsweringYumeng Shi, Quanyu Long, Wenya Wang 0001. 10759-10771 [doi]
- DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at DischargeZonghai Yao, Michael Sun, Won-Seok Jang, Sunjae Kwon, Soie Kwon, Hong Yu 0001. 10772-10798 [doi]
- Can Vision-Language Models Solve Visual Math Equations?Monjoy Narayan Choudhury, Junling Wang 0001, Yifan Hou, Mrinmaya Sachan. 10799-10808 [doi]
- From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical CalculationsBenlu Wang, Iris Xia, Yifan Zhang, Junda Wang, Feiyun Ouyang, Shuo Han, Arman Cohan, Hong Yu 0001, Zonghai Yao. 10809-10833 [doi]
- Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream KnowledgeYi Sui, Chaozhuo Li, Chen Zhang 0013, Dawei Song 0001, Qiuchi Li. 10834-10858 [doi]
- Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language ModelsZiliang Qiu, Renfen Hu. 10859-10872 [doi]
- Identifying Unlearned Data in LLMs via Membership Inference AttacksAdvit Deepak, Megan Mou, Jing Huang, Diyi Yang. 10873-10892 [doi]
- Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language ModelsZihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao 0002, Haoyi Xiong, Mengnan Du. 10893-10913 [doi]
- LLMs cannot spot math errors, even when allowed to peek into the solutionKv Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar. 10914-10928 [doi]
- Can LLMs be Good Graph Judge for Knowledge Graph Construction?Haoyu Huang, Chong Chen, Zeang Sheng, Yang Li, Wentao Zhang. 10929-10948 [doi]
- NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-TuningZhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova. 10949-10966 [doi]
- NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local CommunitiesAbdellah El Mekki, Houdaifa Atou, Omer Nacar, Shady Shehata, Muhammad Abdul-Mageed. 10967-10991 [doi]
- A Computational Simulation of Language Production in First Language AcquisitionYuan Gao, Weiwei Sun. 10992-11006 [doi]
- Long-Form Information Alignment Evaluation Beyond Atomic FactsDanna Zheng, Mirella Lapata, Jeff Z. Pan. 11007-11027 [doi]
- Voice of a Continent: Mapping Africa's Speech Technology FrontierAbdelRahim A. Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed. 11028-11050 [doi]
- Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous DomainsIbne Farabi Shihab, Sanjeda Akter, Anuj Sharma 0001. 11051-11079 [doi]
- Circuit Complexity Bounds for RoPE-based Transformer ArchitectureBo Chen 0029, Xiaoyu Li, Yingyu Liang, Jiangxuan Long 0001, Zhenmei Shi, Zhao Song 0002, Jiahao Zhang. 11080-11097 [doi]
- Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained EnvironmentsIbne Farabi Shihab, Sanjeda Akter, Anuj Sharma 0001. 11098-11126 [doi]
- Towards Infinite-Long Prefix in TransformerYingyu Liang, Zhenmei Shi, Zhao Song 0002, Chiwun Yang. 11127-11191 [doi]
- LATTE: Learning to Think with Vision SpecialistsZixian Ma, Jianguo Zhang 0006, Zhiwei Liu 0001, Jieyu Zhang 0001, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang 0014, Caiming Xiong, Ranjay Krishna, Silvio Savarese. 11192-11229 [doi]
- SUA: Stealthy Multimodal Large Language Model Unlearning AttackXianren Zhang, Hui Liu 0031, Delvin Ce Zhang, Xianfeng Tang, Qi He 0002, Dongwon Lee 0001, Suhang Wang. 11230-11243 [doi]
- ResFormer: All-Time Reservoir Memory for Long Sequence ClassificationHongbo Liu, Jia Xu. 11244-11256 [doi]
- Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language ModelsZeping Yu, Yonatan Belinkov, Sophia Ananiadou. 11257-11272 [doi]
- Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language DocumentationEnora Rice, Katharina von der Wense, Alexis Palmer. 11273-11285 [doi]
- Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal PredictionHuanxin Sheng, Xinyi Liu, Hangfeng He 0001, Jieyu Zhao, Jian Kang. 11286-11328 [doi]
- AlphaOne: Reasoning Models Thinking Slow and Fast at Test TimeJunyu Zhang, Runpei Dong, Han Wang 0019, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta 0001, Huan Zhang 0001. 11329-11354 [doi]
- Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment AnalysisMiao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, Xinru Zhang. 11355-11365 [doi]
- CaKE: Circuit-aware Editing Enables Generalizable Knowledge LearnersYunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang 0001, Shumin Deng, Huajun Chen, Nanyun Peng 0001. 11366-11382 [doi]
- DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic LogicYuheng Wu, Jianwen Xie, Denghui Zhang, Zhaozhuo Xu. 11383-11397 [doi]
- Collaborative Beam Search: Enhancing LLM Reasoning via Collective ConsensusYangyifan Xu, Shuo Ren, Jiajun Zhang. 11398-11410 [doi]
- Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual GenerationKeane Ong, Rui Mao 0010, Deeksha Varshney, Paul Pu Liang, Erik Cambria, Gianmarco Mengaldo. 11411-11434 [doi]
- Towards Statistical Factuality Guarantee for Large Vision-Language ModelsZhuohang Li, Chao Yan 0004, Nicholas J. Jackson, Wendi Cui, Bo Li 0026, Jiaxin Zhang 0005, Bradley A. Malin. 11435-11456 [doi]
- Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark J. F. Gales. 11457-11467 [doi]
- Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong AlignerBolian Li, Yanran Wu, Xinyu Luo, Ruqi Zhang. 11468-11478 [doi]
- Stimulate the Critical Thinking of LLMs via Debiasing DiscussionRuiyu Xiao, Lei Wu 0014, Yuanxing Liu 0001, Weinan Zhang 0003, Ting Liu 0001. 11479-11492 [doi]
- Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit ReasoningXintong Li 0001, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang 0001, Jingbo Shang. 11493-11506 [doi]
- Improving Instruct Models for Free: A Study on Partial AdaptationOzan Irsoy, Pengxiang Cheng 0001, Jennifer L. Chen, Daniel Preotiuc-Pietro, Shiyue Zhang 0001, Duccio Pappadopulo. 11507-11521 [doi]
- CoMMIT: Coordinated Multimodal Instruction TuningXintong Li 0001, Junda Wu, Tong Yu 0001, Rui Wang 0088, Yu Wang, Xiang Chen 0010, Jiuxiang Gu, Lina Yao 0001, Julian J. McAuley, Jingbo Shang. 11522-11536 [doi]
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-JudgeTianhao Wu 0002, Weizhe Yuan, Olga Golovneva, Jing Xu 0014, Yuandong Tian, Jiantao Jiao, Jason E. Weston, Sainbayar Sukhbaatar. 11537-11554 [doi]
- AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent PredictionSong Wang 0013, Zhen Tan 0001, Zihan Chen 0002, Shuang Zhou 0012, Tianlong Chen 0001, Jundong Li. 11555-11567 [doi]
- A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps UsersNishant Balepur, Matthew Shu, Yoo yeon Sung, Seraphina Goldfarb-Tarrant, Shi Feng 0005, Fumeng Yang, Rachel Rudinger, Jordan Lee Boyd-Graber. 11568-11595 [doi]
- Words Like Knives: Backstory-Personalized Modeling and Detection of Violent CommunicationJocelyn J. Shen, Akhila Yerukola, Xuhui Zhou, Cynthia Breazeal, Maarten Sap, Hae Won Park 0001. 11596-11614 [doi]
- Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented GenerationSong Wang 0013, Zihan Chen 0002, Peng Wang 0105, Zhepei Wei, Zhen Tan 0001, Yu Meng, Cong Shen 0001, Jundong Li. 11615-11631 [doi]
- Cognitive Linguistic Identity Fusion Score (CLIFS): A Scalable Cognition-Informed Approach to Quantifying Identity Fusion from TextDevin R. Wright, Jisun An, Yong-Yeol Ahn. 11632-11662 [doi]
- SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object LocalizationTan-Hanh Pham, Hoang Nam Le, Phu-Vinh Nguyen, Chris Ngo, Truong-Son Hy. 11663-11674 [doi]
- CEMTM: Contextual Embedding-based Multimodal Topic ModelingAmirhossein Abaskohi, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini. 11675-11692 [doi]
- RedHerring Attack: Testing the Reliability of Attack DetectionJonathan Rusert. 11693-11708 [doi]
- Modeling Bottom-up Information Quality during Language ProcessingCui Ding, Yanning Yin, Lena Ann Jäger, Ethan Wilcox. 11709-11721 [doi]
- Data Drives Unstable Hierarchical Generalization in LMsTian Qin, Naomi Saphra, David Alvarez-Melis. 11722-11740 [doi]
- EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health SafetyJiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang 0006, Mengdi Wang 0001. 11741-11756 [doi]
- Polysemantic Dropout: Conformal OOD Detection for Specialized LLMsAyush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha. 11757-11770 [doi]
- Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text GenerationFrançois Ledoyen, Gaël Dias, Jérémie Pantin, Alexis Lechervy, Fabrice Maurel, Youssef Chahir. 11771-11797 [doi]
- D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question DecompositionYiyang Huang, Yizhou Wang, Yun Fu. 11798-11811 [doi]
- ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical JudgmentRuochen Li, Jun Li, Bailiang Jian, Kun Yuan, Youxiang Zhu. 11812-11826 [doi]
- MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech TranslationKhai Le-Duc, Tuyen Tran, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang Anh, Hung Phong Tran, Thanh Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh, Thanh Nguyen-Tang. 11827-11952 [doi]
- Beyond Checkmate: Exploring the Creative Choke Points for AI Generated TextsNafis Irtiza Tripto, Saranya Venkatraman, Mahjabin Nahar, Dongwon Lee 0001. 11953-11970 [doi]
- MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human RetrieversJushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz 0001, Tongshuang Wu. 11971-11990 [doi]
- Learning Contextual Retrieval for Robust Conversational SearchSeunghan Yang, Juntae Lee, Jihwan Bang, Kyuhong Shim, Minsoo Kim, Simyung Chang. 11991-12003 [doi]
- LIDDIA: Language-based Intelligent Drug Discovery AgentReza Averly, Frazier N. Baker, Ian A Watson, Xia Ning. 12004-12028 [doi]
- Agentic-R1: Distilled Dual-Strategy ReasoningWeihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang 0002. 12029-12043 [doi]
- Proactive Assistant Dialogue Generation from Streaming Egocentric VideosYichi Zhang 0001, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon. 12044-12068 [doi]
- Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine TranslationDayeon Ki, Kevin Duh, Marine Carpuat. 12069-12092 [doi]
- ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention RefinementAli Salamatian, Amirhossein Abaskohi, Wan-Cyuan Fan, Mir Rayat Imtiaz Hossain, Leonid Sigal, Giuseppe Carenini. 12093-12113 [doi]
- LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense RetrievalYanzhen Shen, Sihao Chen, Xueqiang Xu, Yunyi Zhang, Chaitanya Malaviya, Dan Roth 0001. 12114-12125 [doi]
- ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided PromptFanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu. 12126-12141 [doi]
- Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and FasterXiaoshu Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, Xinwang Liu. 12142-12157 [doi]
- Can an Individual Manipulate the Collective Decisions of Multi-Agents?Fengyuan Liu, Rui Zhao 0001, Shuo Chen 0014, Guohao Li, Philip Torr 0001, Lei Han, Jindong Gu. 12158-12182 [doi]
- Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource LanguagesYujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee. 12183-12201 [doi]
- Improving Clustering with Positive Pairs Generated from LLM-Driven LabelsXiaotong Zhang, Ying Li. 12202-12218 [doi]
- Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language ModelsLijia Lv, Yuanshu Zhao, Guan Wang, Xuehai Tang, Jie Wen 0007, Jizhong Han, Songlin Hu 0001. 12219-12231 [doi]
- Facilitating Long Context Understanding via Supervised Chain-of-Thought ReasoningJingyang Lin, Andy Wong, Tian Xia, Shenghua He, Hui Wei, Mei Han, Jiebo Luo 0001. 12232-12248 [doi]
- Dynamic Energy-Based Contrastive Learning with Multi-Stage Knowledge Verification for Event Causality IdentificationYa Su, Hu Zhang 0003, Yue Fan, Guangjun Zhang, Yujie Wang 0003, Ru Li 0001, Hongye Tan. 12249-12267 [doi]
- ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference AlignmentZhipeng Bian, Jieming Zhu, Qijiong Liu, Wang Lin, Guohao Cai, Zhaocheng Du, Jiacheng Sun, Zhou Zhao 0001, Zhenhua Dong. 12268-12278 [doi]
- From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round RefinementJianzhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang 0003, Buzhou Tang. 12279-12295 [doi]
- A Symbolic Adversarial Learning Framework for Evolving Fake News Generation and DetectionChong Tian, Qirong Ho, Xiuying Chen. 12296-12310 [doi]
- RareSyn: Health Record Synthesis for Rare Disease DiagnosisHuimin Wang, Yutian Zhao, Yefeng Zheng 0001, Xian Wu 0001. 12311-12327 [doi]
- Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling FrameworkJie Chen 0007, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen. 12328-12338 [doi]
- CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in ChinaGuixian Xu, Zeli Su, Ziyin Zhang, Jianing Liu, Xu Han, Ting Zhang, Yushuang Dong. 12339-12346 [doi]
- Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent SystemsXu Shen 0002, Yixin Liu 0001, Yiwei Dai, Yili Wang 0004, Rui Miao 0003, Yue Tan, Shirui Pan, Xin Wang 0035. 12347-12361 [doi]
- Boosting Data Utilization for Multilingual Dense RetrievalChao Huang, Fengran Mo, Yufeng Chen 0005, Changhao Guan, Zhenrui Yue, Xinyu Wang, Jinan Xu, Kaiyu Huang. 12362-12378 [doi]
- Self-Augmented Preference Alignment for Sycophancy Reduction in LLMsChien-Hung Chen, Hen-Hsen Huang, Hsin-Hsi Chen. 12379-12391 [doi]
- TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel PlanningHang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang, Dawei Yin 0001, Hui Xiong 0001, Hao Liu 0026. 12392-12418 [doi]
- Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu LanguageIvory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang 0001, Yaning Jia, William Dinauer, Soroush Vosoughi. 12419-12428 [doi]
- Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem ProvingChuxue Cao, Mengze Li 0001, Juntao Dai, Jinluan Yang, Zijian Zhao 0002, Shengyu Zhang 0001, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo. 12429-12449 [doi]
- From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech RecognitionTianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng. 12450-12464 [doi]
- CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceYong Zhao, Kai Xu 0014, Zhengqiu Zhu, Yue Hu 0016, Zhiheng Zheng, Yingfeng Chen, Yatai Ji, Chen Gao 0001, Yong Li 0008, Jincai Huang 0001. 12465-12480 [doi]
- Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionSreetama Sarkar, Yue Che, Alex Gavin, Peter Anthony Beerel, Souvik Kundu 0002. 12481-12500 [doi]
- Examining False Positives under Inference Scaling for Mathematical ReasoningYu Wang 0089, Nan Yang, Liang Wang, Furu Wei, Fuli Feng. 12501-12520 [doi]
- Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of TranslationeseYikang Liu 0002, Wanyang Zhang, Yiming Wang 0011, Jialong Tang, Pei Zhang 0011, Baosong Yang, Fei Huang 0002, Rui Wang 0015, Hai Hu 0001. 12521-12538 [doi]
- Exploring the Limitations of Mamba in COPY and CoT ReasoningRuifeng Ren, Zhicong Li, Yong Liu. 12539-12563 [doi]
- ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained EnvironmentsDong Wang, Xinghang Li, Zhengshen Zhang, Jirong Liu, Xiao Ma 0006, Hanbo Zhang, Tao Kong, Huaping Liu 0001. 12564-12594 [doi]
- R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image GenerationKaijie Chen, Zihao Lin 0003, Zhiyang Xu, Ying Shen 0006, Yuguang Yao, Joy Rimchala, Jiaxin Zhang 0005, Lifu Huang. 12595-12630 [doi]
- Can GRPO Boost Complex Multimodal Table Understanding?Xiaoqiang Kang, Shengen Wu, Zimu Wang, Yilin Liu, Xiaobo Jin, Kaizhu Huang, Wei Wang 0042, Yutao Yue, Xiaowei Huang, Qiufeng Wang. 12631-12644 [doi]
- MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online GovernanceAgam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan. 12645-12660 [doi]
- Following the Autoregressive Nature of LLM Embeddings via Compression and AlignmentJingcheng Deng, Zhongtao Jiang, Liang Pang 0001, Zihao Wei, Liwei Chen, Kun Xu 0005, Yang Song 0008, Huawei Shen, Xueqi Cheng. 12661-12677 [doi]
- Evaluating LLM-Generated Diagrams as GraphsChumeng Liang, Jiaxuan You. 12678-12690 [doi]
- Breaking Bad Tokens: Detoxification of LLMs Using Sparse AutoencodersAgam Goyal, Vedant Rathi, William Yeh, Yian Wang, Yuen Chen, Hari Sundaram. 12691-12709 [doi]
- VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical ReasoningShi-Yu Tian, Zhi Zhou 0007, Kun-Yang Yu, Ming Yang, Lin-Han Jia, Lan-Zhe Guo, Yufeng Li 0008. 12710-12731 [doi]
- How do autoregressive transformers solve full addition?Wang Peixu, Chen Yu, Yu Ming, Cheng Xiang. 12732-12756 [doi]
- MAIN: Mutual Alignment Is Necessary for instruction tuningFanyi Yang, Jianfeng Liu, Xin Zhang, Haoyu Liu 0002, Xixin Cao, Yuefeng Zhan, Hao Sun 0015, Weiwei Deng, Feng Sun 0008, Qi Zhang 0066. 12757-12769 [doi]
- Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers InterpolationDingwei Chen, Ziqiang Liu, Feiteng Fang, Chak Tou Leong, Shiwen Ni, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang 0007, Chengming Li 0004. 12770-12785 [doi]
- DeepWell-Adol: A Scalable Expert-Based Dialogue Corpus for Adolescent Positive Mental Health and Wellbeing PromotionWenyu Qiu, Yuxiong Wang, Jiajun Tan, Hanchao Hou, Qinda Liu, Wei Yao, Shiguang Ni. 12786-12810 [doi]
- Data to Defense: The Role of Curation in Aligning Large Language Models Against Safety CompromiseXiaoqun Liu, Jiacheng Liang, Luoxi Tang, Muchao Ye, Weicheng Ma, Zhaohan Xi. 12811-12826 [doi]
- Speculative Safety-Aware DecodingXuekang Wang, Shengyu Zhu, Xueqi Cheng. 12827-12841 [doi]
- PanicToCalm: A Proactive Counseling Agent for Panic AttacksJihyun Lee, Yejin Min, San Kim, Yejin Jeon, Sungjun Yang, Hyounghun Kim, Gary Lee. 12842-12874 [doi]
- CoPL: Collaborative Preference Learning for Personalizing LLMsYoungbin Choi, Seunghyuk Cho, Minjong Lee, Moonjeong Park, Yesong Ko, Jungseul Ok, Dongwoo Kim 0002. 12875-12893 [doi]
- Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic UnitsChao Hao, Zezheng Wang, Yanhua Huang, Ruiwen Xu, Wenzhe Niu, Xin Liu, Zitong Yu. 12894-12911 [doi]
- AI Chatbots as Professional Service Agents: Developing a Professional IdentityWenwen Li, Kangwei Shi, Yidong Chai. 12912-12925 [doi]
- DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction TuningZhuoyuan Mao, Mengjie Zhao, Qiyu Wu 0001, Hiromi Wakaki, Yuki Mitsufuji. 12926-12948 [doi]
- Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-AgentGiulia Pucci, Leonardo Ranaldi. 12949-12965 [doi]
- CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA CapabilityHan Peng, Jinhao Jiang, Zican Dong, Wayne Xin Zhao, Lei Fang. 12966-12978 [doi]
- SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?Senyu Li, Jiayi Wang, Felermino D. M. A. Ali, Colin Cherry, Daniel Deutsch, Eleftheria Briakou, Rui Sousa-Silva, Henrique Lopes Cardoso, Pontus Stenetorp, David Ifeoluwa Adelani. 12979-12998 [doi]
- FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of KnowledgeNakyeong Yang, MinSung Kim, Seunghyun Yoon 0002, Joongbo Shin, Kyomin Jung. 12999-13014 [doi]
- Calibrating Pseudo-Labeling with Class Distribution for Semi-supervised Text ClassificationWeiyi Yang, Richong Zhang, Junfan Chen 0001, Jiawei Sheng. 13015-13028 [doi]
- Coarse-to-Fine Grounded Memory for LLM Agent PlanningWei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu. 13029-13056 [doi]
- From A and B to A+B: Can Large Language Models Solve Compositional Math Problems?Xisheng Xiao, Hanlin Zhao. 13057-13078 [doi]
- Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning TrajectoriesMohammad Beigi, Ying Shen 0006, Parshin Shojaee, Qifan Wang 0001, Zichao Wang 0001, Chandan K. Reddy, Ming Jin 0002, Lifu Huang. 13079-13092 [doi]
- SimVBG: Simulating Individual Values by Backstory GenerationBangde Du, Ziyi Ye, Zhijing Wu 0001, Monika Jankowska, Shuqi Zhu, Qingyao Ai, Yujia Zhou 0002, Yiqun Liu 0001. 13093-13122 [doi]
- EvolveSearch: An Iterative Self-Evolving Search AgentDingchu Zhang, Yida Zhao, Jialong Wu 0007, Liwen Zhang, Baixuan Li, Wenbiao Yin, Yong Jiang 0005, Yu-Feng Li, Kewei Tu, Pengjun Xie, Fei Huang 0002. 13123-13136 [doi]
- Syntax-Aware Retrieval Augmentation for Neural Symbolic RegressionCanmiao Zhou, Han Huang 0002. 13137-13147 [doi]
- Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMsDingkun Zhang, Shuhan Qi, Xinyu Xiao, Kehai Chen, Xuan Wang 0002. 13148-13164 [doi]
- Graceful Forgetting in Generative Language ModelsChunyang Jiang, Chi-Min Chan, Yiyang Cai, Yulong Liu, Wei Xue 0002, Yike Guo. 13165-13180 [doi]
- Answering Narrative-Driven Recommendation Queries via a Retrieve-Rank Paradigm and the OCG-AgentYunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, Min Xu. 13181-13202 [doi]
- Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined ValuesHongbo Zhang, Han Cui, Guangsheng Bao, Linyi Yang, Jun Wang, Yue Zhang. 13203-13216 [doi]
- Jailbreak-Tuning: Models Efficiently Learn Jailbreak SusceptibilityBrendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Tom Tseng, Julius Broomfield, Adam Gleave, Kellin Pelrine. 13217-13246 [doi]
- Neural Topic Modeling via Contextual and Graph Information FusionJiyuan Liu 0013, Jiaxing Yan, Chunjiang Zhu, Xingyu Liu, Li Qing, Yanghui Rao. 13247-13263 [doi]
- CARE: A Disagreement Detection Framework with Concept Alignment and Reasoning EnhancementJiyuan Liu 0013, Jielin Song, Yunhe Pang 0001, Zhiyu Shen, Yanghui Rao. 13264-13279 [doi]
- Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational AgentsYejin Yoon, Yuri Son, Namyoung So, Minseo Kim, Minsoo Cho, Chanhee Park, Seungshin Lee, Taeuk Kim. 13280-13306 [doi]
- LightThinker: Thinking Step-by-Step CompressionJintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng 0004, Huajun Chen, Ningyu Zhang 0001. 13307-13328 [doi]
- How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled BenchmarkMinglai Yang 0002, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Yang Wang, Liangming Pan. 13329-13347 [doi]
- Investigating Pedagogical Teacher and Student LLM Agents: Genetic Adaptation Meets Retrieval-Augmented Generation Across Learning StylesDebdeep Sanyal, Agniva Maiti, Umakanta Maharana, Dhruv Kumar 0001, Ankur Arjun Mali, C. Lee Giles, Murari Mandal. 13348-13389 [doi]
- GeoEdit: Geometric Knowledge Editing for Large Language ModelsYujie Feng, Li-Ming Zhan, Zexin Lu, Yongxin Xu, Xu Chu, Yasha Wang, Jiannong Cao 0001, Philip S. Yu, Xiao-Ming Wu 0003. 13390-13405 [doi]
- A Generative Pre-Trained Language Model for Channel Prediction in Wireless Communications SystemsBo Li 0026, Huanming Zhang, Yuhua Jiang, Yucong Wang, Tengyu Zhang, Shaoqiang Yan, Hongyao Li, Yihong Liu, Feifei Gao. 13406-13419 [doi]
- AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual LearningYujie Feng, Jian Li, Xiaoyu Dong, Pengfei Xu, Xiaohui Zhou, Yujia Zhang, Zexin Lu, Yasha Wang, Alan Zhao, Xu Chu, Xiao-Ming Wu. 13420-13437 [doi]
- R-PRM: Reasoning-Driven Process Reward ModelingShuaijie She, Junxiao Liu, Yifeng Liu, Jiajun Chen, Xin Huang, Shujian Huang. 13438-13451 [doi]
- RLAE: Reinforcement Learning-Assisted Ensemble for LLMsYuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao. 13452-13466 [doi]
- Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer ArithmeticYang Yan, Yu Lu, Renjun Xu, Zhenzhong Lan. 13467-13483 [doi]
- AskToAct: Enhancing LLMs Tool Use via Self-Correcting ClarificationXuan Zhang, Yongliang Shen 0001, Zhe Zheng, Linjuan Wu, Wenqi Zhang 0001, Yuchen Yan, Qiuying Peng, Jun Wang, Weiming Lu 0001. 13484-13511 [doi]
- START: Self-taught Reasoner with ToolsChengpeng Li 0001, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang 0004, Beichen Zhang, Bowen Yu 0002, Binyuan Hui, Junyang Lin, Xiang Wang 0010, Dayiheng Liu. 13512-13553 [doi]
- The Impact of Negated Text on Hallucination with Large Language ModelsJaehyung Seo, Hyeonseok Moon, HeuiSeok Lim. 13554-13572 [doi]
- A Probabilistic Inference Scaling Theory for LLM Self-CorrectionZhe Yang, Yichang Zhang, Yudong Wang, Ziyao Xu, Junyang Lin, Zhifang Sui. 13573-13587 [doi]
- MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social MediaWei Zhai, Nan Bai, Qing Zhao 0005, Jianqiang Li 0002, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu. 13588-13603 [doi]
- Knowledge-Aware Co-Reasoning for Multidisciplinary CollaborationXurui Li, Wanghaijiao, Kaisong Song, Rui Zhu, Haixu Tang. 13604-13620 [doi]
- Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction FollowingYueen Ma 0001, Dafeng Chi, Shiguang Wu 0004, Yuecheng Liu, Yuzheng Zhuang, Irwin King. 13621-13639 [doi]
- MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song TranslationWoohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu. 13640-13668 [doi]
- MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement LearningWenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, Linchao Zhu. 13669-13681 [doi]
- PRIM: Towards Practical In-Image Multilingual Machine TranslationYanzhi Tian, Zeming Liu, Zhengyang Liu, Chong Feng 0001, Xin Li, Heyan Huang, Yuhang Guo 0001. 13682-13697 [doi]
- Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTEBeatrice Savoldi, Giuseppe Attanasio, Eleonora Cupin, Eleni Gkovedarou, Janiça Hackenbuchner, Anne Lauscher, Matteo Negri, Andrea Piergentili, Manjinder Thind, Luisa Bentivogli. 13698-13720 [doi]
- DiplomacyAgent: Do LLMs Balance Interests and Ethical Principles in International Events?Jianxiang Peng, Ling Shi 0004, Xinwei Wu 0001, Hanwen Zhang, Fujiang Liu, Haocheng Lyu, Deyi Xiong. 13721-13739 [doi]
- DisLoRA: Task-specific Low-Rank Adaptation via Orthogonal Basis from Singular Value DecompositionShe Yifei, Xinhao Wei, Yulong Wang. 13740-13755 [doi]
- Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question AnsweringZixin Chen, Sicheng Song, Kashun Shum, Yanna Lin, Rui Sheng, Weiqi Wang, Huamin Qu. 13756-13789 [doi]
- Textual Aesthetics in Large Language ModelsLingjie Jiang, Shaohan Huang, Xun Wu, Furu Wei. 13790-13818 [doi]
- Section-Level Simplification of Biomedical AbstractsJan Bakker, Jaap Kamps. 13819-13833 [doi]
- PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language TranslationAbhinav Joshi, Vaibhav Sharma, Sanjeet Singh, Ashutosh Modi. 13834-13853 [doi]
- Few-Shot Open-Set Classification via Reasoning-Aware DecompositionAvyav Kumar Singh, Helen Yannakoudakis. 13854-13875 [doi]
- Translation in the Hands of Many: Centering Lay Users in Machine Translation InteractionsBeatrice Savoldi, Alan Ramponi, Matteo Negri, Luisa Bentivogli. 13876-13889 [doi]
- iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool UseYirong Zeng, Xiao Ding, Yuxian Wang, Weiwen Liu, Yutai Hou, Wu Ning, Xu Huang 0008, Duyu Tang, Dandan Tu, Bing Qin 0001, Ting Liu 0001. 13890-13905 [doi]
- Transplant Then Regenerate: A New Paradigm for Text Data AugmentationGuangzhan Wang, Hongyu Zhang 0002, Beijun Shen, Xiaodong Gu 0002. 13906-13920 [doi]
- Compositional Generalisation for Explainable Hate Speech DetectionAgostina Calabrese, Tom Sherborne, Björn Ross, Mirella Lapata. 13921-13943 [doi]
- CCQA: Generating Question from Solution Can Improve Inference-Time Reasoning in SLMsJinyoung Kim, Ji Won Yoon. 13944-13956 [doi]
- TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority LanguagesJiu Sha, Yu Weng, Mengxiao Zhu, Chong Feng, Zheng Liu, Jialedongzhu. 13957-13967 [doi]
- Transparent and Coherent Procedural Mistake DetectionShane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J. Corso, Joyce Chai. 13968-14002 [doi]
- Teaching Your Models to Understand Code via Focal Preference AlignmentJie Wu 0001, Haoling Li, Xin Zhang 0099, Xiao Liu 0029, Yangyu Huang, Jianwen Luo, Yizhen Zhang, Zuchao Li, Ruihang Chu, Yujiu Yang 0001, Scarlett Li. 14003-14023 [doi]
- MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware RetrievalXixi Wu, Yanchao Tan, Nan Hou, Ruiyang Zhang, Hong Cheng 0001. 14024-14045 [doi]
- Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene DescriptionsIoanna Ntinou, Alexandros Xenos, Yassine Ouali, Adrian Bulat, Georgios Tzimiropoulos. 14046-14062 [doi]
- TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningXiaohan Yu, Pu Jian, Chong Chen. 14063-14082 [doi]
- Retrieval Enhanced Feedback via In-context Neural Error-bookJongyeop Hyun, Bumsoo Kim. 14083-14098 [doi]
- Improve LLM-as-a-Judge Ability as a General AbilityJiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li. 14099-14115 [doi]
- G2: Guided Generation for Enhanced Output Diversity in LLMsZhiwen Ruan, Yixia Li, Yefeng Liu, Yun Chen, Weihua Luo, Peng Li, Yang Liu, Guanhua Chen. 14116-14134 [doi]
- ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool InvocationsYuejin Xie, Youliang Yuan, Wenxuan Wang 0001, Fan Mo, Jianmin Guo, Pinjia He. 14135-14156 [doi]
- Learning to See through Sound: From VggCaps to Multi2Cap for Richer Automated Audio CaptioningSangyeon Cho, Mingi Kim, Jinkwon Hwang, Jaehoon Go, Minuk Ma, Sunjae Yoon, Junyeong Kim. 14157-14175 [doi]
- Towards Optimal Evaluation Efficiency for Large Language ModelsGuohong Li, Deyi Xiong. 14176-14183 [doi]
- MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning GraphsYiheng Hu, Xiaoyang Wang 0002, Qing Liu 0001, Xiwei Xu 0001, Qian Fu, Wenjie Zhang 0001, Liming Zhu 0001. 14184-14200 [doi]
- Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction TuningSugyeong Eo, Jung-Jun Lee, Chanjun Park, HeuiSeok Lim. 14201-14212 [doi]
- Process-Supervised Reinforcement Learning for Code GenerationYufan Ye, Ting Zhang, Wenbin Jiang 0002, Hua Huang. 14213-14226 [doi]
- MuCAL: Contrastive Alignment for Preference-Driven KG-to-Text GenerationYifei Song, Claire Gardent. 14227-14270 [doi]
- Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal ModelsWei Wang 0378, Zhaowei Li, Qi Xu, Linfeng Li, Yiqing Cai, Botian Jiang, Hang Song, Xingcan Hu, Pengyu Wang 0006, Li Xiao. 14271-14290 [doi]
- Thought calibration: Efficient and confident test-time scalingMenghua Wu, Cai Zhou, Stephen Bates, Tommi S. Jaakkola. 14291-14305 [doi]
- Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic ComputationZiling Cheng, Meng Cao 0003, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung. 14306-14333 [doi]
- QCRD: Quality-guided Contrastive Rationale Distillation for Large Language ModelsWei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao. 14334-14345 [doi]
- SHARP: Steering Hallucination in LVLMs via Representation EngineeringJunfei Wu, Yue Ding 0009, Guofan Liu, Tianze Xia, Ziyue Huang, Dianbo Sui, Qiang Liu 0006, Shu Wu, Liang Wang 0001, Tieniu Tan. 14346-14361 [doi]
- Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible SpeechTony Woo, Sehun Lee, Kang-Wook Kim 0002, Gunhee Kim. 14362-14379 [doi]
- Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained SettingsSafal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith W. Ross. 14380-14401 [doi]
- PPTAgent: Generating and Evaluating Presentations Beyond Text-to-SlidesHao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng 0009, Weixiang Zhou, Hongyu Lin, Yaojie Lu 0001, Xianpei Han, Le Sun 0001. 14402-14418 [doi]
- SWAM: Adaptive Sliding Window and Memory-Augmented Attention Model for Rumor DetectionMei Guo, Chen Chen 0012, Chunyan Hou, Yike Wu 0002, Xiaojie Yuan. 14419-14430 [doi]
- HydraRAG: Structured Cross-Source Enhanced Large Language Model ReasoningXingyu Tan, Xiaoyang Wang 0002, Qing Liu 0001, Xiwei Xu 0001, Xin Yuan 0004, Liming Zhu 0001, Wenjie Zhang 0001. 14431-14459 [doi]
- VRoPE: Rotary Position Embedding for Video Large Language ModelsZikang Liu, Longteng Guo, Yepeng Tang, Tongtian Yue, Junxian Cai, Kai Ma, Qingbin Liu, Xi Chen, Jing Liu 0001. 14460-14472 [doi]
- SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLPDecheng Duan, Jitong Peng, Yingyi Zhang, Chengzhi Zhang. 14473-14486 [doi]
- Think and Recall: Layer-Level Prompting for Lifelong Model EditingJinke Wang, Zenan Ying, Qi Liu, Wei Chen, Tong Xu, Huijun Hou, Zhi Zheng. 14487-14502 [doi]
- SPIRIT: Patching Speech Language Models against Jailbreak AttacksAmirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas. 14503-14520 [doi]
- FIRE: Flexible Integration of Data Quality Ratings for Effective PretrainingLiangyu Xu, Xuemiao Zhang, Feiyu Duan, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai. 14521-14541 [doi]
- Multi-Domain Explainability of PreferencesNitay Calderon, Liat Ein-Dor, Roi Reichart. 14542-14575 [doi]
- Tuning Less, Prompting More: In-Context Preference Learning Pipeline for Natural Language TransformationShuyun Yang, Yan Zhang, Zhengmao Ye, Lei Duan, MingJie Tang. 14576-14587 [doi]
- IL-PCSR: Legal Corpus for Prior Case and Statute RetrievalShounak Paul, Dhananjay Ghumare, Pawan Goyal 0002, Saptarshi Ghosh 0001, Ashutosh Modi. 14588-14611 [doi]
- ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability KnowledgeChaoyue He, Xin Zhou 0008, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang 0199, Di Wang 0004, Shengfei Lyu, Hong Xu 0004, Xiaoqiao Wang, Wei Liu, Chunyan Miao. 14612-14653 [doi]
- How Sememic Components Can Benefit Link Prediction for Lexico-Semantic Knowledge Graphs?Hansi Wang, Yue Wang, Qiliang Liang, Yang Liu. 14654-14673 [doi]
- WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image ClassificationYiwen Jiang, Deval Mehta, Siyuan Yan, Yaling Shen, Zimu Wang, ZongYuan Ge. 14674-14685 [doi]
- Calibration Across Layers: Understanding Calibration Evolution in LLMsAbhinav Joshi, Areeb Ahmad, Ashutosh Modi. 14686-14714 [doi]
- The discordance between embedded ethics and cultural inference in large language modelsAida Ramezani, Yang Xu. 14715-14736 [doi]
- SSA: Semantic Contamination of LLM-Driven Fake News DetectionCheng Xu 0006, Nan Yan, Shuhao Guan, Yuke Mei, M. Tahar Kechadi. 14737-14751 [doi]
- Logits-Based FinetuningJingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu 0001, Jiaya Jia. 14752-14764 [doi]
- STARE at the Structure: Steering ICL Exemplar Selection with Structural AlignmentJiaqian Li, Qisheng Hu, Jing Li, Wenya Wang. 14765-14782 [doi]
- PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought DistillationTao Fan 0002, Guoqiang Ma, Yuanfeng Song, Lixin Fan, Qiang Yang 0001. 14783-14794 [doi]
- Efficient Beam Search for Large Language Models Using Trie-Based DecodingBrian J. Chan, Mao Xun Huang, Jui-Hung Cheng, Chao-Ting Chen, Hen-Hsen Huang. 14795-14807 [doi]
- Power doesn't reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed dataPavan Sai Balaga, Nagasamudram Karthik, Challa Vishwanath, Raksha Sharma, Rudra Murthy, Ashish R. Mittal. 14808-14816 [doi]
- Evaluating Taxonomy Free Character Role Labeling (TF-CRL) in News Stories using Large Language ModelsDavid G. Hobson, Derek Ruths, Andrew Piper. 14817-14839 [doi]
- MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with ResistanceSubin Kim, Hoonrae Kim, Jihyun Lee, Yejin Jeon, Gary Lee. 14840-14869 [doi]
- RETAIL: Towards Real-world Travel Planning for Large Language ModelsBin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu 0002, Shuai Chen, Yuanfang Guo, Yunhong Wang 0001. 14870-14902 [doi]
- Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and VerificationTuc Nguyen, Yifan Hu, Thai Le. 14903-14919 [doi]
- Reward Model Perspectives: Whose Opinions Do Reward Models Reward?Elle. 14920-14944 [doi]
- FLRC: Fine-grained Low-Rank Compressor for Efficient LLM InferenceYu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu Fang Hu, Kai-Chiang Wu. 14945-14955 [doi]
- Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual KnowledgeEshaan Tanwar, Anwoy Chatterjee, Michael Saxon, Alon Albalak, William Yang Wang, Tanmoy Chakraborty 0002. 14956-14979 [doi]
- CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information RetrievalAng Li 0049, Yiquan Wu 0001, Yinghao Hu, Lizhi Qing, Shihang Wang, Chengyuan Liu, Tao Wu, Adam Jatowt, Ming Cai, Fei Wu 0001, Kun Kuang. 14980-14999 [doi]
- Conan-Embedding-v2: Training an LLM from Scratch for Text EmbeddingsShiyu Li, Yang Tang, Ruijie Liu, Shi-Zhe Chen, Xi Chen. 15000-15016 [doi]
- Vision-and-Language Navigation with Analogical Textual Descriptions in LLMsYue Zhang 0004, Tianyi Ma, Zun Wang, Yanyuan Qiao, Parisa KordJamshidi. 15017-15025 [doi]
- MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language ModelsXiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai, Xinyue Lou, Yunghwei Lai, Ziyue Wang, Yawen Wang, Kaiyu Huang, Yile Wang, Peng Li, Yang Liu. 15026-15048 [doi]
- Mind the Gap: How BabyLMs Learn Filler-Gap DependenciesChi-Yun Chang, Xueyang Huang, Humaira Nasir, Shane Storks, Olawale Akingbade, Huteng Dai. 15049-15065 [doi]
- Paths Not Taken: Understanding and Mending the Multilingual Factual Recall PipelineMeng Lu, Ruochen Zhang 0001, Carsten Eickhoff, Ellie Pavlick. 15066-15096 [doi]
- BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis ModelsZsolt T. Kardkovács, Lynda Djennane, Anna Field, Boualem Benatallah, Yacine Gaci, Fabio Casati, Walid Gaaloul. 15097-15113 [doi]
- Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language ModelsChen Han, Wenzhen Zheng, Xijin Tang. 15114-15129 [doi]
- Controllable Memorization in LLMs via Weight PruningChenjie Ni, Zhepeng Wang, Runxue Bao, Shangqian Gao, Yanfu Zhang. 15130-15145 [doi]
- Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error AnnotationsPoorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, Antonios Anastasopoulos. 15146-15167 [doi]
- DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor SearchLei Yang, Shaoyang Xu, Jianxiang Peng, ShaoLin Zhu, Deyi Xiong. 15168-15182 [doi]
- Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented GenerationJiayu Yao, Shenghua Liu, Yiwei Wang 0001, Lingrui Mei, Baolong Bi, Yuyao Ge, Zhecheng Li, Xueqi Cheng. 15183-15193 [doi]
- Let's Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models' Understanding of SportsPunit Kumar Singh, Nishant Kumar, Akash Ghosh, Kunal Pasad, Khushi Soni, Manisha Jaishwal, Sriparna Saha 0001, Syukron Abu Ishaq Alfarozi, Asres Temam Abagissa, Kitsuchart Pasupa, Haiqin Yang, José G. Moreno 0001. 15194-15241 [doi]
- Multilingual Federated Low-Rank Adaptation for Collaborative Content Anomaly Detection across Multilingual Social Media ParticipantsJiaxin Li, Geng Zhao, Xiaoci Zhang. 15242-15262 [doi]
- M3Retrieve: Benchmarking Multimodal Retrieval for MedicineArkadeep Acharya, Akash Ghosh, Pradeepika Verma, Kitsuchart Pasupa, Sriparna Saha 0001, Priti Singh. 15263-15276 [doi]
- The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent SystemsZengqing Wu, Takayuki Ito. 15277-15297 [doi]
- Friend or Foe? A Computational Investigation of Semantic False Friends across Romance LanguagesAna Sabina Uban, Liviu P. Dinu, Ioan-Bogdan Iordache, Simona Georgescu, Claudia Vlad. 15298-15312 [doi]
- KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language ModelsSeorin Kim, Dongyoung Lee, Jaejin Lee. 15313-15334 [doi]
- SeMob: Semantic Synthesis for Dynamic Urban Mobility PredictionRunfei Chen, Shuyang Jiang, Wei Huang. 15335-15355 [doi]
- DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsYize Cheng, Wenxiao Wang 0002, Mazda Moayeri, Soheil Feizi. 15356-15373 [doi]
- Minimal, Local, and Robust: Embedding-Only Edits for Implicit Bias in T2I ModelsFeng He, Chao Zhang, Zhixue Zhao. 15374-15392 [doi]
- Journalism-Guided Agentic In-context Learning for News Stance DetectionDahyun Lee, Jonghyeon Choi, Jiyoung Han, Kunwoo Park. 15393-15416 [doi]
- Less Is MuRE: Revisiting Shallow Knowledge Graph EmbeddingsVictor Charpenay, Steven Schockaert. 15417-15443 [doi]
- Jailbreak LLMs through Internal Stance ManipulationShuangjie Fu, Du Su, Beining Huang, Fei Sun 0001, Jingang Wang, Wei Chen 0013, Huawei Shen, Xueqi Cheng. 15444-15459 [doi]
- Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit AnalysisHaoming Huang, Yibo Yan, Jiahao Huo, Xin Zou 0001, Xinfeng Li, Kun Wang 0056, Xuming Hu. 15460-15479 [doi]
- Complex Numerical Reasoning with Numerical Semantic Pre-training FrameworkJun Zhang, Haihong E, Tianyi Hu, Yifan Zhu 0001, Meina Song, Haoran Luo 0001. 15480-15514 [doi]
- Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity ModellingSydney Anuyah, Mehedi Mahmud Kaushik, Sri Rama Krishna Reddy Dwarampudi, Rakesh Shiradkar, Arjan Durresi, Sunandan Chakraborty. 15515-15539 [doi]
- OntologyRAG-Q: Resource Development and Benchmarking for Retrieval-Augmented Question Answering in Qur'anic TafsirSadam Al-Azani, Maad Alowaifeer, Alhanoof Alhunief, Ahmed Abdelali. 15540-15558 [doi]
- The Practical Impacts of Theoretical Constructs on Empathy ModelingAllison Lahnala, Charles Welch, David Jurgens, Lucie Flek. 15559-15586 [doi]
- RecBase: Generative Foundation Model Pretraining for Zero-Shot RecommendationSashuai Zhou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang 0013, Yan Xia 0006, Ruiming Tang, Zhenhua Dong, Zhou Zhao 0001. 15587-15599 [doi]
- Grouping Entities with Shared Properties using Multi-Facet Prompting and Property EmbeddingsAmit Gajbhiye, Thomas Bailleux, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert. 15600-15615 [doi]
- Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect ClusteringKun Zhu 0025, Lizi Liao, Yuxuan Gu 0004, Lei Huang 0021, Xiaocheng Feng, Bing Qin 0001. 15616-15634 [doi]
- Benchmark Profiling: Mechanistic Diagnosis of LLM BenchmarksDongjun Kim, Gyuho Shim, Yongchan Chun, MinHyuk Kim, Chanjun Park, HeuiSeok Lim. 15635-15650 [doi]
- TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer ReviewYuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Hayden Kwok-Hay So, Zhijiang Guo, Liya Zhu, Ngai Wong 0001. 15651-15682 [doi]
- Improving Chemical Understanding of LLMs via SMILES ParsingYunhui Jang, Jaehyung Kim, Sungsoo Ahn. 15683-15698 [doi]
- Can Large Language Models Tackle Graph Partitioning?Yiheng Wu, Ningchao Ge, YanMin Li, Liwei Qian, Mengna Zhu, Haoyu Yang, Haiwen Chen, Jibing Wu. 15699-15719 [doi]
- To See a World in a Spark of Neuron: Disentangling Multi-Task Interference for Training-Free Model MergingZitao Fang, Guodong Du 0002, Shuyang Yu, Yifei Guo, YiWei Zhang, Yiyao Cao, Jing Li 0034, Ho-Kin Tang, Sim Kuan Goh. 15720-15740 [doi]
- What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech DetectionBinh Nguyen, Shuju Shi, Ryan Ofman, Thai Le. 15741-15755 [doi]
- Task-Aware Resolution Optimization for Visual Large Language ModelsWeiqing Luo, Zhen Tan 0001, Yifan Li, Xinyu Zhao, Kwonjoon Lee, Behzad Dariush, Tianlong Chen 0001. 15756-15770 [doi]
- CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklistsYukyung Lee, Joonghoon Kim, Jaehee Kim, Hyowon Cho, Jaewook Kang, Pilsung Kang 0001, Najoung Kim. 15771-15798 [doi]
- A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text ExplanationsLingjun Zhao, Hal Daumé III. 15799-15813 [doi]
- Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language ModelsQihang Ma, Shengyu Li, Jie Tang, Dingkang Yang, Chenshaodong, Yingyi Zhang, Chao Feng, Ran Jiao. 15814-15827 [doi]
- Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code GenerationTianhao Niu, Yiming Cui 0001, Baoxin Wang, Xiao Xu 0005, Xin Yao, Qingfu Zhu, Dayong Wu, Shijin Wang 0001, Wanxiang Che. 15828-15844 [doi]
- The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating ItZheng Xin Yong, Beyza Ermis, Marzieh Fadaee, Stephen H. Bach, Julia Kreutzer. 15845-15860 [doi]
- AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional PromptSaket S. Chaturvedi, Gaurav Bagwe, Lan Zhang 0005, Xiaoyong Yuan. 15861-15878 [doi]
- From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration TestingLanxiao Huang, Daksh Dave, Tyler Cody, Peter A. Beling, Ming Jin 0002. 15879-15905 [doi]
- Editing Across Languages: A Survey of Multilingual Knowledge EditingNadir Durrani, Basel Mousi, Fahim Dalvi. 15906-15918 [doi]
- Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor AttacksGaurav Bagwe, Saket S. Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Zhang 0005. 15919-15937 [doi]
- Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector DatabasesHarshil Vejendla. 15938-15949 [doi]
- The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral DilemmasYa Wu, Qiang Sheng 0001, Danding Wang, Guang Yang 0031, Yifan Sun, Zhengjia Wang 0001, Yuyan Bu, Juan Cao 0001. 15950-15970 [doi]
- SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer ScalingHarshil Vejendla. 15971-15978 [doi]
- ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning TasksHeng Zhou, Hejia Geng, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang 0001, Zhenfei Yin, Lei Bai 0001. 15979-15998 [doi]
- ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint ProgrammingWeichun Shi, Minghao Liu 0001, Wanting Zhang, Langchen Shi, Fuqi Jia, Feifei Ma, Jian Zhang 0001. 15999-16019 [doi]
- VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape RoomsSeungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu. 16020-16047 [doi]
- ESC-Judge: A Framework for Comparing Emotional Support Conversational AgentsNavid Madani, Rohini K. Srihari. 16048-16065 [doi]
- Neuron-Level Differentiation of Memorization and Generalization in Large Language ModelsKo-Wei Huang, Yi-Fu Fu, Ching-Yu Tsai, Yu-Chieh Tu, Tzu-Ling Cheng, Cheng-Yu Lin, Yi-Ting Yang, Heng-Yi Liu, Keng-Te Liao 0001, Da-Cheng Juan, Shou-de Lin. 16066-16080 [doi]
- Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMsZhuoxuan Zhang, Jinhao Duan, Edward Kim 0006, Kaidi Xu. 16081-16099 [doi]
- Do Slides Help? Multi-modal Context for Automatic Transcription of Conference TalksSupriti Sinhamahapatra, Jan Niehues. 16100-16110 [doi]
- Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual QueriesTianyi Lorena Yan, Robin Jia. 16111-16134 [doi]
- Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint FramesSahithya Ravi, Gabriel Herbert Sarch, Vibhav Vineet, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel. 16135-16150 [doi]
- Enhancing Chain-of-Thought Reasoning via Neuron Activation Differential AnalysisYiru Tang, Kun Zhou 0002, Yingqian Min, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang 0001. 16151-16159 [doi]
- PakBBQ: A Culturally Adapted Bias Benchmark for QAAbdullah Hashmat, Muhammad Arham Mirza, Agha Ali Raza. 16160-16172 [doi]
- MULTIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and ModalitiesSahil Verma 0003, Keegan Hines, Jeff A. Bilmes, Charlotte Siska, Luke Zettlemoyer, Hila Gonen, Chandan Singh. 16173-16187 [doi]
- Comparing human and LLM politeness strategies in free productionHaoran Zhao, Robert D. Hawkins. 16188-16216 [doi]
- ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning via Tool-integrated Action for Dynamic Offer OptimizationDeuksin Kwon, Jiwon Hae, Emma Clift, Daniel Shamsoddini, Jonathan Gratch, Gale M. Lucas. 16217-16238 [doi]
- CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information AlignmentNura Aljaafari, Danilo S. Carvalho, André Freitas. 16239-16259 [doi]
- MEPT: Mixture of Expert Prompt Tuning as a Manifold MapperRunjia Zeng, Guangyan Sun, Qifan Wang 0001, Tong Geng, Sohail A. Dianat, Xiaotian Han, Raghuveer Rao, Xueling Zhang, Cheng Han 0001, Lifu Huang, Dongfang Liu. 16260-16280 [doi]
- KG-CQR: Leveraging Structured Relation Representations in Knowledge Graphs for Contextual Query RetrievalChi Minh Bui, Ngoc Mai Thieu, Van Vinh Nguyen, Jason J. Jung, Khac-Hoai Nam Bui. 16281-16298 [doi]
- SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual ConnectionMaithili Joshi, Palash Nandi, Tanmoy Chakraborty 0002. 16299-16314 [doi]
- When Truthful Representations Flip Under Deceptive Instructions?Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li. 16315-16335 [doi]
- Can LLMs simulate the same correct solutions to free-response math problems as real students?Yuya Asano, Diane J. Litman, Erin Walker. 16336-16365 [doi]
- Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and HumansDeuksin Kwon, Kaleen Shrestha, Bin Han, Elena Hayoung Lee, Gale Lucas. 16366-16380 [doi]
- RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model MergingBowen Wang, Haiyuan Wan, Liwen Shi, Chen Yang, Peng He, Yue Ma, Haochen Han, Wenhao Li, Tiao Tan, Yongjian Li, Fangming Liu, Yifan Gong 0010, Sheng Zhang. 16381-16395 [doi]
- Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design DecisionsEmmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, Graham Neubig. 16396-16427 [doi]
- Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion DynamicsJiarui Liu 0004, Yueqi Song, Yunze Xiao, Mingqian Zheng, Lindia Tjuatja, Jana Schaich Borg, Mona T. Diab, Maarten Sap. 16428-16458 [doi]
- Linear-Time Demonstration Selection for In-Context Learning via Gradient EstimationZiniu Zhang, Zhenshuo Zhang, Dongyue Li, Lu Wang 0008, Jennifer G. Dy, Hongyang R. Zhang. 16459-16477 [doi]
- Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech DocumentsChutong Meng, Philipp Koehn. 16478-16494 [doi]
- TurBLiMP: A Turkish Benchmark of Linguistic Minimal PairsEzgi Basar, Francesca Padovani, Jaap Jumelet, Arianna Bisazza. 16495-16510 [doi]
- DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity RecognitionHanjun Luo, Yingbin Jin, Yiran Wang, Xinfeng Li, Tong Shang, Xuecheng Liu, Ruizhe Chen, Kun Wang 0056, Hanan Salam, Qingsong Wen, Zuozhu Liu. 16511-16535 [doi]
- Reliable and Cost-Effective Exploratory Data Analysis via Graph-Guided RAGMossad Helali, Yutai Luo, Tae Jun Ham, Jim Plotts, Ashwin Chaugule, Jichuan Chang, Parthasarathy Ranganathan, Essam Mansour 0001. 16536-16553 [doi]
- Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process RewardsJaehoon Yun, Jiwoong Sohn, Jungwoo Park, Hyunjae Kim, Xiangru Tang, Daniel Shao, Yonghoe Koo, Minhyeok Ko, Qingyu Chen 0001, Mark Gerstein, Michael Moor, Jaewoo Kang. 16554-16571 [doi]
- Graders Should Cheat: Privileged Information Enables Expert-Level Automated EvaluationsJin Peng Zhou, Sébastien M. R. Arnold, Nan Ding 0002, Kilian Q. Weinberger, Nan Hua, Fei Sha. 16572-16590 [doi]
- SAMULE: Self-Learning Agents Enhanced by Multi-level ReflectionYubin Ge, Salvatore Romeo, Jason Cai, Monica Sunkara, Yi Zhang. 16591-16610 [doi]
- Database-Augmented Query Representation for Information RetrievalSoyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park. 16611-16633 [doi]
- The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political SpeechNaama Rivlin-Angert, Guy Mor-Lan. 16634-16647 [doi]
- Attention Eclipse: Manipulating Attention to Bypass LLM Safety-AlignmentPedram Zaree, Md Abdullah Al Mamun, Quazi Mishkatul Alam, Yue Dong 0002, Ihsen Alouani, Nael B. Abu-Ghazaleh. 16648-16668 [doi]
- Representation Potentials of Foundation Models for Multimodal Alignment: A SurveyJianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu. 16669-16684 [doi]
- Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form GenerationZiyin Zhang, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He 0002, Rui Wang 0015, Zhaopeng Tu. 16685-16697 [doi]
- Visual-Aware Speech Recognition for Noisy ScenariosBalaji Darur, Karan Singla. 16698-16706 [doi]
- Advancing Arabic Diacritization: Improved Datasets, Benchmarking, and State-of-the-Art ModelsAbubakr Mohamed, Hamdy Mubarak. 16707-16719 [doi]
- Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday TasksArjun Arunasalam, Madison Pickering, Z. Berkay Celik, Blase Ur. 16720-16743 [doi]
- Dynamic Retriever for In-Context Knowledge Editing via Policy OptimizationMahmud Wasif Nafee, Maiqi Jiang, Haipeng Chen, Yanfu Zhang. 16744-16757 [doi]
- LVLMs are Bad at Overhearing Human Referential CommunicationZhengxiang Wang, Weiling Li, Panagiotis Kaliosis, Owen Rambow, Susan Brennan. 16758-16782 [doi]
- Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math CapabilityRuida Wang, Yuxin Li, Yi R. Fung, Tong Zhang. 16783-16809 [doi]
- TORSO: Template-Oriented Reasoning Towards General TasksMinHyuk Kim, Seungyoon Lee, HeuiSeok Lim. 16810-16818 [doi]
- Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the WildSheshera Mysore, Debarati Das 0004, Hancheng Cao, Bahareh Sarrafzadeh. 16819-16846 [doi]
- WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music ReasoningGagan Mundada, Yash Vishe, Amit Namburi, Xin Xu 0010, Zachary Novack, Julian J. McAuley, Junda Wu. 16847-16863 [doi]
- TRIAL: Token Relations and Importance Aware Late-interaction for Accurate Text RetrievalHyukkyu Kang, Injung Kim, Wook-Shin Han. 16864-16877 [doi]
- Do Large Language Models excel in Complex Logical Reasoning with Formal Language?Jin Jiang, Jianing Wang, Yuchen Yan, Yang Liu, Jianhua Zhu, Mengdi Zhang, Liangcai Gao. 16878-16903 [doi]
- Fair or Framed? Political Bias in News Articles Generated by LLMsJunho Yoo, Youhyun Shin. 16904-16930 [doi]
- ReviewRL: Towards Automated Scientific Review with RLSihang Zeng, Kai Tian, Kaiyan Zhang, Yuru Wang, Junqi Gao, Runze Liu 0002, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou 0002. 16931-16943 [doi]
- Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AIOctavian Alexandru Trifan, Jason Lee Weber, Marc Titus Trifan, Alexandru Nicolau, Alexander V. Veidenbaum. 16944-16957 [doi]
- Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause FrequenciesTerrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta. 16958-16982 [doi]
- REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge EditingHaitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu 0006, Shu Wu, Zhe Zhao, Liang Wang 0001, Tieniu Tan. 16983-17000 [doi]
- ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning ModelsChung-En Sun, Ge Yan, Tsui-Wei Weng. 17001-17025 [doi]
- Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional FrameworkMeng-Chen Wu, Si-Chi Chin, Tess Wood, Ayush Goyal, Narayanan Sadagopan. 17026-17061 [doi]
- Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme ExplanationYubo Xie, Chenkai Wang, Zongyang Ma, Fahui Miao. 17062-17083 [doi]
- RoDEval: A Robust Word Sense Disambiguation Evaluation Framework for Large Language ModelsLuyang Zhang, Shuaimin Li, Yishuo Li, Kunpeng Kang, Kaiyuan Zhang, Cong Wang, Wenpeng Lu. 17084-17115 [doi]
- PychoAgent: Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster EventsMengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao 0001, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li 0008, Quanjun Yin. 17116-17134 [doi]
- Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' ReasoningZezhong Wang 0004, Xingshan Zeng, Weiwen Liu, Yufei Wang 0005, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang 0002, Qun Liu 0001, Kam-Fai Wong. 17135-17148 [doi]
- Inter-sentence Context Modeling and Structure-aware Representation Enhancement for Conversational Sentiment Quadruple ExtractionYu Zhang, Zhaoman Zhong, Huihui Lv. 17149-17159 [doi]
- Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined RewardsXiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin 0001. 17160-17186 [doi]
- Governance in Motion: Co-evolution of Constitutions and AI models for Scalable SafetyChenhao Huang, Ziyu Shen, Yicong Ren, Huiyuan Zheng, Jiazheng Zhang, Mingxu Chai, Ming Zhang 0030, Shihan Dou, Fan Mo, Jie Shi, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 17187-17210 [doi]
- Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language ModelsYisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu. 17211-17224 [doi]
- SciEvent: Benchmarking Multi-domain Scientific Event ExtractionBofu Dong, Pritesh Shah, Sumedh Sonawane, Tiyasha Banerjee, Erin Brady, Xinya Du, Ming Jiang. 17225-17255 [doi]
- Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated CitationsSunhao Dai, Zhanshuo Cao, Wenjie Wang 0007, Liang Pang 0001, Jun Xu 0001, See-Kiong Ng, Tat-Seng Chua. 17256-17276 [doi]
- RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMsCan Lin, Zhengwang Jiang, Ling Zheng, Qi Zhao, Yuhang Zhang, Qi Song, Wangqiu Zhou. 17277-17294 [doi]
- Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese DatasetTaisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka. 17295-17313 [doi]
- Chameleon LLMs: User Personas Influence Chatbot Personality ShiftsJane Xing, Tianyi Niu, Shashank Srivastava. 17314-17332 [doi]
- GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language ModelsDylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh, Justin Zhan, Tianyu Jiang. 17333-17349 [doi]
- SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language ModelsShang Liu, Yao Lu, Wenji Fang, Jing Wang, Zhiyao Xie. 17350-17365 [doi]
- Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug ErrorsZhiyu Yang, Shuo Wang, Yukun Yan, Yang Deng 0002. 17366-17381 [doi]
- Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inferenceLibo Zhang, Zhaoning Zhang, Xubaizhou, Rui Li, Zhiliang Tian, Songzhu Mei, Dongsheng Li. 17382-17395 [doi]
- V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language ModelsQidong Wang, Junjie Hu, Ming Jiang. 17396-17420 [doi]
- LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian LanguagesAlham Fikri Aji, Trevor Cohn. 17421-17446 [doi]
- MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference LearningJingyan Shen, Jiarui Yao, Rui Yang 0010, Yifan Sun, Feng Luo, Rui Pan 0002, Tong Zhang 0001, Han Zhao 0002. 17447-17463 [doi]
- SAFE: Schema-Driven Approximate Distance Join for Efficient Knowledge Graph QueryingSangoh Lee, Sungho Park, Wook-Shin Han. 17464-17489 [doi]
- Structured Preference Optimization for Vision-Language Long-Horizon Task PlanningXiwen Liang, Min Lin, Weiqi Ruan, Rongtao Xu, Yuecheng Liu, Jiaqi Chen, Bingqian Lin, Yuzheng Zhuang, Xiaodan Liang. 17490-17515 [doi]
- Position: LLMs Can be Good Tutors in English EducationJingheng Ye, Shen Wang 0005, Deqing Zou, Yibo Yan, Kun Wang 0042, Hai-Tao Zheng 0002, Ruitong Liu, Zenglin Xu, Irwin King, Philip S. Yu, Qingsong Wen. 17516-17535 [doi]
- CLLMate: A Multimodal Benchmark for Weather and Climate Events ForecastingHaobo Li 0003, Zhaowei Wang, Jiachen Wang 0001, Yueya Wang, Alexis Kai-Hon Lau, Huamin Qu. 17536-17562 [doi]
- Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language ModelsZhipeng Chen 0001, Kun Zhou 0002, Liang Song, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen. 17563-17580 [doi]
- Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for RetrievalPranjal A. Chitale, Bishal Santra, Yashoteja Prabhu, Amit Sharma 0007. 17581-17617 [doi]
- Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?Ashutosh Bajpai, Tanmoy Chakraborty 0002. 17618-17636 [doi]
- MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language ModelsZixin Chen, Hongzhan Lin 0001, Kaixin Li, Ziyang Luo, Yayue Deng, Jing Ma 0004. 17637-17659 [doi]
- Multi-perspective Analysis of Large Language Model Domain Specialization: An Experiment in Accounting Audit Procedures GenerationYusuke Noro. 17660-17682 [doi]
- Generator-Assistant Stepwise Rollback Framework for Large Language Model AgentXingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai 0001, Yong Xu 0001, Min Zhang 0005. 17683-17700 [doi]
- DocAgent: An Agentic Framework for Multi-Modal Long-Context Document UnderstandingLi Sun, Liu He, Shuyue Jia, Yangfan He, Chenyu You. 17701-17716 [doi]
- EasyRec: Simple yet Effective Language Models for RecommendationXubin Ren, Chao Huang 0001. 17717-17732 [doi]
- From Automation to Autonomy: A Survey on Large Language Models in Scientific DiscoveryTianshi Zheng, Zheye Deng, Hong Ting Tsang, Weiqi Wang 0001, Jiaxin Bai, Zihao Wang 0001, Yangqiu Song. 17733-17750 [doi]
- Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMsZhen Xiong, Yujun Cai, Zhecheng Li, Yiwei Wang 0001. 17751-17763 [doi]
- ViPE: Visual Perception in Parameter Space for Efficient Video-Language UnderstandingShichen Lu, Tongtian Yue, Longteng Guo, Handong Li, Xingjian He, Si Liu 0001, Jing Liu 0001. 17764-17775 [doi]
- Alignment for Efficient Tool Calling of Large Language ModelsHongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Shuai Fan 0005, Lu Chen 0002, Kai Yu 0004. 17776-17792 [doi]
- ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language ModelsJiani Guo, Zuchao Li, Jie Wu 0001, Qianren Wang, Yun Li 0011, Lefei Zhang, Hai Zhao 0001, Yu-Jiu Yang 0001. 17793-17812 [doi]
- BANMIME : Misogyny Detection with Metaphor Explanation on Bangla MemesMd Ayon Mia, Akm Moshiur Rahman Mazumder, Khadiza Sultana Sayma, Md Fahim, Md Tahmid Hasan Fuad, Muhammad Ibrahim Khan, AKMMahbubur Rahman. 17813-17839 [doi]
- Phi: Preference Hijacking in Multi-modal Large Language Models at Inference TimeYifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin 0001, Jinghui Chen. 17840-17865 [doi]
- Retrieval-augmented GUI Agents with Generative GuidelinesRan Xu 0002, Kaixin Ma, Wenhao Yu 0002, Hongming Zhang 0009, Joyce C. Ho, Carl Yang 0001, Dong Yu 0001. 17866-17875 [doi]
- COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context AwarenessChun Kang, Zhigu Qian, Zhen Fu, Jiaojiao Fu, Yangfan Zhou 0002. 17876-17895 [doi]
- Answer Convergence as a Signal for Early Stopping in ReasoningXin Liu, Lu Wang. 17896-17907 [doi]
- VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference FactsXin Liu, LeChen Zhang, Sheza Munir, Yiyang Gu, Lu Wang. 17908-17925 [doi]
- SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic ParsingSimone Papicchio, Luca Cagliero, Paolo Papotti. 17926-17946 [doi]
- Reliable Evaluation and Benchmarks for Statement AutoformalizationAuguste Poiroux, Gail Weiss, Viktor Kuncak, Antoine Bosselut. 17947-17969 [doi]
- VisBias: Measuring Explicit and Implicit Social Biases in Vision Language ModelsJen-tse Huang 0001, Jiantong Qin, Jianping Zhang 0002, Youliang Yuan, Wenxuan Wang 0001, Jieyu Zhao 0001. 17970-17993 [doi]
- Less Is More? Examining Fairness in Pruned Large Language Models for Summarising OpinionsNannan Huang, Haytham M. Fayek, Xiuzhen Zhang 0001. 17994-18018 [doi]
- AI Sees Your Location - But With A Bias Toward The Wealthy WorldJingyuan Huang, Jen-tse Huang 0001, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang 0001, Jieyu Zhao 0001. 18019-18039 [doi]
- Faster In-Context Learning for LLMs via N-Gram Trie Speculative DecodingJinglin Chen, Qiwei Li 0002, Zuchao Li, Baoyuan Qi, Liu Guoming, Haojun Ai, Hai Zhao 0001, Ping Wang 0028. 18040-18051 [doi]
- From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMsMuhammad Farid Adilazuarda, Chen Cecilia Liu, Iryna Gurevych, Alham Fikri Aji. 18052-18079 [doi]
- Iterative Prompt Refinement for Safer Text-to-Image GenerationJinwoo Jeon, Junhyeok Oh, Hayeong Lee, Byung Jun Lee. 18080-18096 [doi]
- Language Models as Continuous Self-Evolving Data EngineersPeidong Wang, Ming Wang 0006, Zhiming Ma, Xiaocui Yang, Shi Feng 0001, Daling Wang, Yifei Zhang 0003, Kaisong Song. 18097-18116 [doi]
- Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative InferenceHua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu 0017, Weilin Shen, Zihao Wen, Tianke Ban. 18117-18131 [doi]
- Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading ScenariosYunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou 0001, Yanggan Gu, Jungang Li, Jingyu Wang, Peijie Jiang, Aiwei Liu, Jia Liu, Xuming Hu. 18132-18173 [doi]
- Evaluating and Aligning Human Economic Risk Preferences in LLMsJiaxin Liu, Yixuan Tang, Yi Yang, Kar Yan Tam. 18174-18188 [doi]
- Ensembling Prompting Strategies for Zero-Shot Hierarchical Text Classification with Large Language ModelsMingxuan Xia, Zhijie Jiang, Haobo Wang 0001, Junbo Zhao 0002, Tianlei Hu, Gang Chen 0001. 18189-18208 [doi]
- Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level TokenizersEugene Jang, Kimin Lee, Jin-Woo Chung, Keuntae Park, Seungwon Shin 0001. 18209-18216 [doi]
- UI-Hawk: Unleashing the Screen Stream Understanding for Mobile GUI AgentsJiwen Zhang, Ya-Qi Yu, Minghui Liao, Wentao Li, Jihao Wu, Zhongyu Wei. 18217-18236 [doi]
- UniDebugger: Hierarchical Multi-Agent Framework for Unified Software DebuggingCheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang 0001, Zhouruixin Zhu, Lingming Zhang 0001, Michael R. Lyu. 18237-18266 [doi]
- Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode TheoryMing Li, Nan Zhang, Chenrui Fan, Hong Jiao, Yanbin Fu, Sydney Peters, Qingshu Xu, Robert Lissitz, Tianyi Zhou. 18267-18288 [doi]
- Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented GenerationKaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Shuzheng Si, Lu Wang 0029, Pu Zhao 0004, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001, Baobao Chang. 18289-18308 [doi]
- Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreementGabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza. 18309-18326 [doi]
- STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language ModelsKai Chen, Zihao He, Taiwei Shi, Kristina Lerman. 18327-18355 [doi]
- Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information ExtractionMarija Sakota, Robert West 0001. 18356-18371 [doi]
- MultiLogicNMR(er): A Benchmark and Neural-Symbolic Framework for Non-monotonic Reasoning with Multiple ExtensionsYeliang Xiu, Yongmei Liu. 18372-18405 [doi]
- Beyond Demographics: Enhancing Cultural Value Survey Simulation with Multi-Stage Personality-Driven Cognitive ReasoningHaijiang Liu, Qiyuan Li, Chao Gao 0014, Yong Cao 0001, Xiangyu Xu, Xun Wu, Daniel Hershcovich, Jinguang Gu. 18406-18428 [doi]
- CrystalICL: Enabling In-Context Learning for Crystal GenerationRuobing Wang, Qiaoyu Tan, Yili Wang, Ying Wang, Xin Wang. 18429-18444 [doi]
- Towards a Unified Paradigm of Concept Editing in Large Language ModelsZhuowen Han, Xinwei Wu 0001, Dan Shi 0001, Renren Jin, Deyi Xiong. 18445-18461 [doi]
- Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language ModelsKaiyan Chang 0001, Yonghao Shi, Chenglong Wang 0002, Hang Zhou, Chi Hu, Xiaoqian Liu, Yingfeng Luo, Yuan Ge 0001, Tong Xiao 0001, Jingbo Zhu. 18462-18477 [doi]
- Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE AdaptationJunzhuo Li, Bo Wang, Xiuze Zhou, Xuming Hu. 18478-18493 [doi]
- RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion ModelsZhuozhuo Tu, Cheng Chen, Yuxuan Du. 18494-18507 [doi]
- Evaluating Spatiotemporal Consistency in Automatically Generated Sewing InstructionsLuisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller. 18508-18525 [doi]
- MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language ModelsZhen Zhang, Yifan Yang, Kai Zhen, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang. 18526-18543 [doi]
- Procedural Environment Generation for Tool-Use AgentsMichael Sullivan, Mareike Hartmann, Alexander Koller. 18544-18562 [doi]
- FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language ModelsYanling Wang, Haoyang Li 0015, Hao Zou, Jing Zhang 0001, Xinlei He 0001, Qi Li 0002, Ke Xu 0002. 18563-18582 [doi]
- OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM AgentBowen Chen, Zhao Wang, Shingo Takamatsu. 18583-18601 [doi]
- Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced AgentsGuangfu Guo, Xiaoqian Lu, Yue Feng. 18602-18616 [doi]
- TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language ModelsAsif Hanif, Maha Tufail Agro, Fahad Shamshad, Karthik Nandakumar. 18617-18633 [doi]
- Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech IdentificationSourav Das, Kripabandhu Ghosh. 18634-18656 [doi]
- Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer GroupsDavide Ghilardi, Federico Belotti, Marco Molinari 0002, Tao Ma, Matteo Palmonari. 18657-18677 [doi]
- Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation ExtractionLei Hei, Tingjing Liao, Peiyingxin, Yiyang Qi, Jiaqi Wang, Ruiting Li, Feiliang Ren. 18678-18693 [doi]
- PunMemeCN: A Benchmark to Explore Vision-Language Models' Understanding of Chinese Pun MemesZhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, Deqing Yang. 18694-18710 [doi]
- UltraIF: Advancing Instruction Following from the WildKaikai An, Li Sheng, Ganqu Cui, Shuzheng Si, Ning Ding, Yu Cheng, Baobao Chang. 18711-18726 [doi]
- Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection FrameworkHongyi Tang, Zhihao Zhu, Yi Yang. 18727-18740 [doi]
- TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question AnsweringBoyi Zhang, Zhuo Liu, Hangfeng He 0001. 18741-18762 [doi]
- Mapping Toxic Comments Across Demographics: A Dataset from German Public BroadcastingJan Fillies, Michael Peter Hoffmann, Rebecca Reichel, Roman Salzwedel, Sven Bodemer, Adrian Paschke. 18763-18779 [doi]
- Small Models, Big Results: Achieving Superior Intent Extraction through DecompositionDanielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, Anatoly Efros. 18780-18799 [doi]
- On Pruning State-Space LLMsTamer Ghattas, Michael Hassid, Roy Schwartz 0001. 18800-18814 [doi]
- An Orthogonal High-Rank Adaptation for Large Language ModelsXin Zhang 0100, Guang-Ze Chen, Shuzhen Li, Zhulin Liu, C. L. Philip Chen, Tong Zhang 0015. 18815-18833 [doi]
- BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network TrainingWenjie Zhou, Bohan Wang, Wei Chen, Xueqi Cheng. 18834-18849 [doi]
- Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationNoy Sternlicht, Ariel Gera, Roy Bar-Haim, Tom Hope, Noam Slonim. 18850-18869 [doi]
- METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingMengyue Wang, Shuo Chen 0014, Kristian Kersting, Volker Tresp, Yunpu Ma. 18870-18884 [doi]
- VisiPruner: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMsYingqi Fan, Anhao Zhao, JinLan Fu, Junlong Tong, Hui Su, Yijie Pan, Wei Zhang 0185, Xiaoyu Shen 0001. 18885-18902 [doi]
- Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender SystemsSong Jin, Juntian Zhang, Yuhan Liu 0023, Xun Zhang, Yufei Zhang, Guojun Yin, Fei Jiang, Wei Lin, Rui Yan 0001. 18903-18920 [doi]
- SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based ReflectionQin Chen, Yuanyi Ren, Xiaojun Ma 0001, Mugeng Liu, Shi Han, Dongmei Zhang 0001. 18921-18939 [doi]
- CAIR: Counterfactual-based Agent Influence Ranker for Agentic AI WorkflowsAmit Giloni, Chiara Picardi, Roy Betser, Shamik Bose, Aishvariya Priya Rathina Sabapathy, Roman Vainshtein. 18940-18966 [doi]
- ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuningYiming Du, Yifan Xiang, Bin Liang 0004, Dahua Lin, Kam-Fai Wong, Fei Tan 0002. 18967-18985 [doi]
- Precise In-Parameter Concept Erasure in Large Language ModelsYoav Gur-Arieh, Clara Suslik, Yihuai Hong, Fazl Barez, Mor Geva. 18986-19006 [doi]
- PhonoThink: Improving Large Language Models' Reasoning on Chinese Phonological AmbiguitiesJianfei Ma, Zhaoxin Feng, Emmanuele Chersoni, Huacheng Song, Ziqi Zhang. 19007-19022 [doi]
- SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQLJimin Lee 0001, Ingeol Baek, Byeongjeong Kim, Hyunkyung Bae, Hwanhee Lee. 19023-19035 [doi]
- ExpandR: Teaching Dense Retrievers Beyond Queries with LLM GuidanceSijia Yao, Pengcheng Huang 0004, Zhenghao Liu 0001, Yu Gu 0002, Yukun Yan, Shi Yu 0001, Ge Yu 0001. 19036-19054 [doi]
- Anecdoctoring: Automated Red-Teaming Across Language and PlaceAlejandro Cuevas, Saloni Dash, Bharat Kumar Nayak, Dan Vann, Madeleine I. G. Daepp. 19055-19074 [doi]
- ACING: Actor-Critic for Instruction Learning in Black-Box LLMsSalma Kharrat, Fares Fourati, Marco Canini. 19075-19102 [doi]
- Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and HindiSourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, Monojit Choudhury. 19103-19126 [doi]
- Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain ExpertiseHanyin Wang, Chufan Gao, Qiping Xu, Bolun Liu, Guleid Hussein, Hariprasad Reddy Korsapati, Mohamad El Labban, Kingsley Iheasirim, Mohamed Hassan, Gokhan Anil, Brian Bartlett, Jimeng Sun 0001. 19127-19147 [doi]
- GCML: Gradient Coherence Guided Meta-Learning for Cross-Domain Emerging Topic Rumor DetectionZejiang He, Jingyuan Huang, Menglong Lu, Zhen Huang 0006, Shanshan Liu, Zhiliang Tian, Dong Sheng Li 0001. 19148-19162 [doi]
- Can LLMs Generate and Solve Linguistic Olympiad Puzzles?Neh Majmudar, Elena Filatova. 19163-19200 [doi]
- E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and ReasoningZihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang. 19201-19230 [doi]
- DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized DomainsZhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng. 19231-19253 [doi]
- Multi-Document Event Extraction Using Large and Small Language ModelsQingkai Min, Zitian Qu, Qipeng Guo, Xiangkun Hu, Zheng Zhang, Yue Zhang 0004. 19254-19285 [doi]
- MA-GTS: A Multi-Agent Framework for Solving Complex Graph Problems in Real-World ApplicationsZike Yuan, Ming Liu, Hui Wang, Bing Qin. 19286-19304 [doi]
- Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio EncodersWeiqiao Shan, Yuang Li, Yuhao Zhang, Yingfeng Luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu. 19305-19320 [doi]
- CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language ModelsRunze Li, Siyu Wu, Jun Wang, Wei Zhang. 19321-19334 [doi]
- Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNetsChenlin Liu, Minghui Fang 0002, Patrick Zhang, Wei Zhou, Jie Gao, Jiqing Han 0001. 19335-19353 [doi]
- MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and CorrectionYuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai, Yonatan Bisk, Olexandr Isayev. 19354-19371 [doi]
- Shared Path: Unraveling Memorization in Multilingual LLMs through Language SimilaritiesXiaoyu Luo, Yiyi Chen 0002, Johannes Bjerva, Qiongxiu Li. 19372-19388 [doi]
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented GenerationChaojun Nie, Jun Zhou, Guanxiang Wang, Shisong Wu, Zichen Wang. 19389-19406 [doi]
- LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal RetrievalJian Zhang 0002, Junyi Guo, Junyi Yuan, Huanda Lu, Yanlin Zhou, Fangyu Wu, Qiufeng Wang, Dongming Lu. 19407-19417 [doi]
- Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait ImpressionsNicholas Deas, Kathleen McKeown. 19418-19444 [doi]
- Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference OptimizationJiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin 0001, Lingyong Yan, Min Cao, Min Zhang 0006. 19445-19461 [doi]
- 3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data SelectionHongxin Ding, Yue Fang, Runchuan Zhu, Xinke Jiang, Jinyang Zhang, Yongxin Xu, Weibin Liao, Xu Chu, Junfeng Zhao 0001, Yasha Wang. 19462-19484 [doi]
- InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV ShowsKirolos Ataallah, Eslam Mohamed Bakr, Mahmoud Ahmed, Chenhui Gou, Khushbu Pahwa, Jian Ding 0001, Mohamed Elhoseiny. 19485-19512 [doi]
- Intrinsic Test of Unlearning Using Parametric Knowledge TracesYihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, Mor Geva. 19513-19535 [doi]
- Speculative Streaming: Efficient and Scalable Speculative Decoding with Multi-Stream AttentionNikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Antonie Lin, Mohammad Rastegari, Mahyar Najibi. 19536-19559 [doi]
- Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social MediaYujie Wang, Yunwei Zhao, Jing Yang, Han Han, Shiguang Shan, Jie Zhang 0071. 19560-19572 [doi]
- Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMsMario Sanz-Guerrero, Minh Duc Bui, Katharina von der Wense. 19573-19583 [doi]
- VocalNet: Speech LLMs with Multi-Token Prediction for Faster and High-Quality GenerationYuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang. 19584-19601 [doi]
- Path Drift in Large Reasoning Models: How First-Person Commitments Override SafetyYuyi Huang, Runzhe Zhan, Lidia S. Chao, Ailin Tao, Derek F. Wong. 19602-19616 [doi]
- CBP-Tuning: Efficient Local Customization for Black-box Large Language ModelsJiaxuan Zhao, Naibin Gu, Yuchen Feng, Xiyu Liu 0003, Peng Fu 0008, Zheng Lin 0001, Weiping Wang 0005. 19617-19630 [doi]
- Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay AssessmentAhmed Karim, Qiao Wang, Zheng Yuan. 19631-19636 [doi]
- Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack PromptsGeorgios Chochlakis, Peter Wu, Arjun Bedi, Marcus Ma, Kristina Lerman, Shrikanth Narayanan. 19637-19656 [doi]
- Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt ManipulationSofia Jamil, Kotla Sai Charan, Sriparna Saha 0001, Koustava Goswami, K. J. Joseph. 19657-19665 [doi]
- Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image GuidanceHaozhe Zhao, Shuzheng Si, Liang Chen 0024, Yichi Zhang, Maosong Sun 0001, Baobao Chang, Minjia Zhang. 19666-19690 [doi]
- Who Holds the Pen? Caricature and Perspective in LLM Retellings of HistoryLubna Zahan Lamia, Mabsur Fatin Bin Hossain, Md. Mosaddek Khan. 19691-19710 [doi]
- DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMsMinxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen 0013, Wei Zhou 0019, Jungong Han, Guiguang Ding, Wenwu Ou, Di Zhang 0026, Kun Gai, Songlin Hu 0001. 19711-19722 [doi]
- Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing UncertaintyPeilin Wu, Mian Zhang, Xinlu Zhang, Xinya Du, Zhiyu Chen 0002. 19723-19734 [doi]
- Child-Directed Language Does Not Consistently Boost Syntax Learning in Language ModelsFrancesca Padovani, Jaap Jumelet, Yevgen Matusevych, Arianna Bisazza. 19735-19756 [doi]
- Benchmarking Debiasing Methods for LLM-based Parameter EstimatesNicolas Audinet de Pieuchon, Adel Daoud, Connor Thomas Jerzak, Moa Johansson 0001, Richard Johansson. 19757-19772 [doi]
- (Almost) Free Modality Stitching of Foundation ModelsJaisidh Singh, Diganta Misra, Boris Knyazev, Antonio Orvieto. 19773-19789 [doi]
- VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal DataTingqiao Xu, Ziru Zeng, Jiayu Chen. 19790-19809 [doi]
- Rescorla-Wagner Steering of LLMs for Undesired Behaviors over Disproportionate Inappropriate ContextRushi Wang, Jiateng Liu, Cheng Qian 0008, Yifan Shen, Yanzhou Pan, Zhaozhuo Xu, Ahmed Abbasi, Heng Ji 0001, Denghui Zhang. 19810-19845 [doi]
- Exploring Artificial Image Generation for Stance DetectionZhengkang Zhang, Zhongqing Wang, Guodong Zhou. 19846-19861 [doi]
- Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope SpeechJonathan Pofcher, Christopher M. Homan, Randall Sell, Ashiqur R. KhudaBukhsh. 19862-19888 [doi]
- Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMsAndong Hua, Kenan Tang, Chenhe Gu, Jindong Gu, Eric Wong 0001, Yao Qin 0001. 19889-19899 [doi]
- Topic Coverage-based Demonstration Retrieval for In-Context LearningWonbin Kweon, SeongKu Kang, Runchu Tian, Pengcheng Jiang, Jiawei Han 0001, Hwanjo Yu. 19900-19912 [doi]
- On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad ConceptsLinlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy. 19913-19935 [doi]
- MuseScorer: Idea Originality Scoring At ScaleAli Sarosh Bangash, Krish Veera, Ishfat Abrar Islam, Raiyan Abdul Baten. 19936-19954 [doi]
- SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offsJoão Fonseca, Andrew Bell, Julia Stoyanovich. 19955-19969 [doi]
- RaDeR: Reasoning-aware Dense Retrieval ModelsDebrup Das, Seán Ó Nualláin, Razieh Rahimi. 19970-19997 [doi]
- A Culturally-diverse Multilingual Multimodal Video Benchmark & ModelBhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz 0001, Hanoona Abdul Rasheed, Dinura Dissanayake, Mohammed Irfan Kurpath, Yahya Hmaiti, Go Inoue, Jean Lahoud, Md. Safirur Rashid, Shadid Intisar Quasem, Maheen Fatima, Franco Vidal, Mykola Maslych, Ketan Pravin More, Sanoojan Baliah, Hasindri Watawana, Yuhao Li, Fabian Farestam, Leon Schaller, Roman Tymtsiv, Simon Weber 0002, Hisham Cholakkal, Ivan Laptev, Shin'ichi Satoh 0001, Michael Felsberg, Mubarak Shah, Salman H. Khan 0001, Fahad Shahbaz Khan. 19998-20022 [doi]
- DRES: Fake news detection by dynamic representation and ensemble selectionFaramarz Farhangian, Leandro Augusto Ensina, George D. C. Cavalcanti, Rafael M. O. Cruz. 20023-20041 [doi]
- A Graph-Theoretical Framework for Analyzing the Behavior of Causal Language ModelsRashin Rahnamoun, Mehrnoush Shamsfard. 20042-20073 [doi]
- Membership and Memorization in LLM Knowledge DistillationZiqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi. 20074-20084 [doi]
- Balanced Multi-Factor In-Context Learning for Multilingual Large Language ModelsMasahiro Kaneko, Alham Fikri Aji, Timothy Baldwin. 20085-20104 [doi]
- Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-kChihiro Taguchi, Seiji Maekawa, Nikita Bhutani. 20105-20130 [doi]
- Languages Still Left Behind: Toward a Better Multilingual Machine Translation BenchmarkChihiro Taguchi, Seng Mai, Keita Kurabe, Yusuke Sakai 0010, Georgina Agyei, Soudabeh Eslami, David Chiang 0001. 20131-20143 [doi]
- Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping GamesCésar Guerra-Solano, Zhuochun Li, Xiang Lorraine Li. 20144-20165 [doi]
- Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language ModelsRenjie Pi, Kehao Miao, Li Peihang, Runtao Liu, Jiahui Gao, Jipeng Zhang, Xiaofang Zhou. 20166-20180 [doi]
- MR. Judge: Multimodal Reasoner as a JudgeRenjie Pi, Haoping Bai, Qibin Chen, Xiaoming Simon Wang, Jiulong Shan, Xiaojiang Liu, Meng Cao. 20181-20205 [doi]
- MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference EnginesLei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram. 20206-20223 [doi]
- Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMsWafa Al Ghallabi, Ritesh Thawkar, Sara Ghaboura, Ketan Pravin More, Omkar Thawakar, Hisham Cholakkal, Salman Khan 0001, Rao Muhammad Anwer. 20224-20244 [doi]
- CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningJoshua Ong Jun Leang, Aryo Pradipta Gema, Shay B. Cohen. 20245-20274 [doi]
- s1: Simple test-time scalingNiklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei 0001, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel J. Candès, Tatsunori Hashimoto. 20275-20321 [doi]
- Learning Subjective Label Distributions via Sociocultural DescriptorsMohammed Fayiz Parappan, Ricardo Henao. 20322-20338 [doi]
- COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto FrontierGaoxiang Luo, Aryan Deshwal. 20339-20352 [doi]
- ML-Promise: A Multilingual Dataset for Corporate Promise VerificationYohei Seki, Hakusen Shu, Anaïs Lhuissier, Hanwool Lee, Juyeon Kang, Min-Yuh Day, Chung-Chi Chen 0001. 20353-20366 [doi]
- Reading Between the Prompts: How Stereotypes Shape LLM's Implicit PersonalizationVera Neplenbroek, Arianna Bisazza, Raquel Fernández. 20367-20400 [doi]
- Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text GenerationYen-Ju Lu, Thomas Thebaud, Laureano Moro-Velázquez, Najim Dehak, Jesús Villalba 0001. 20401-20423 [doi]
- Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps TranslationDi Wu, Seth Aycock, Christof Monz. 20424-20440 [doi]
- How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR HeadsIngeol Baek, Hwan Chang, Sunghyun Ryu, Hwanhee Lee. 20441-20453 [doi]
- Explainability and Interpretability of Multilingual Large Language Models: A SurveyLucas Resck, Isabelle Augenstein, Anna Korhonen. 20454-20486 [doi]
- Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit CommunitiesYoungWoo Kim, Himanshu Beniwal, Steven L. Johnson, Thomas Hartvigsen. 20487-20498 [doi]
- AcT2I: Evaluating and Improving Action Depiction in Text-to-Image ModelsVatsal Malaviya, Agneet Chatterjee, Maitreya Patel, Yezhou Yang, Chitta Baral. 20499-20516 [doi]
- Assessing French Readability for Adults with Low Literacy: A Global and Local PerspectiveWafa Aissa, Thibault Bañeras Roux, Elodie Vanzeveren, Lingyun Gao, Rodrigo Wilkens, Thomas François. 20517-20539 [doi]
- LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop RetrievalJoohyung Yun, Doyup Lee, Wook-Shin Han. 20540-20559 [doi]
- DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM ReasoningTanmay Parekh, Kartik Mehta, Ninareh Mehrabi, Kai-Wei Chang 0001, Nanyun Peng 0001. 20560-20582 [doi]
- SNaRe: Domain-aware Data Generation for Low-Resource Event DetectionTanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang 0001, Nanyun Peng 0001. 20583-20604 [doi]
- Table-R1: Inference-Time Scaling for Table Reasoning TasksZheyuan Yang, Lyuhao Chen, Arman Cohan, Yilun Zhao 0001. 20605-20624 [doi]
- LimRank: Less is More for Reasoning-Intensive Information RerankingTingyu Song, Yilun Zhao 0001, Siyue Zhang, Chen Zhao 0013, Arman Cohan. 20625-20639 [doi]
- PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem SolvingMihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long T. Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang 0002, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi. 20640-20666 [doi]
- An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code GenerationShubham Gandhi, Atharva Naik, Yiqing Xie, Carolyn P. Rosé. 20667-20686 [doi]
- What are Foundation Models Cooking in the Post-Soviet World?Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu 0004. 20687-20709 [doi]
- LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM ReasoningTianshi Zheng, Cheng Jiayang, Chunyang Li, Haochen Shi, Zihao Wang 0001, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See. 20710-20731 [doi]
- EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language ModelsHan Liu, Ruoyao Wen, Srijith Nair, Jia Liu 0002, Wenjing Lou, Chongjie Zhang, William Yeoh 0001, Yevgeniy Vorobeychik, Ning Zhang 0017. 20732-20746 [doi]
- Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?Boxiang Ma, Ru Li, Yuanlong Wang, Hongye Tan, Xiaoli Li. 20747-20763 [doi]
- Priority on High-Quality: Selecting Instruction Data via Consistency Verification of Noise InjectionHong Zhang, Feng Zhao 0003, Ruilin Zhao, Cheng Yan, Kangzheng Liu. 20764-20776 [doi]
- Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge CutoffsXin Gao, Ruiyi Zhang, Daniel Du, Saurabh Mahindre, Sai Ashish Somayajula, Pengtao Xie. 20777-20788 [doi]
- DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language ModelsYiqiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya Zhang, Yanfeng Wang, Yu Wang. 20789-20808 [doi]
- Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language ModelsHyeonseok Moon, Seongtae Hong, Jaehyung Seo, HeuiSeok Lim. 20809-20823 [doi]
- Generative Annotation for ASR Named Entity CorrectionYuanchang Luo, Daimeng Wei, Shaojun Li, Hengchao Shang, Jiaxin Guo, Zongyao Li, Zhanglin Wu, Xiaoyu Chen 0004, Zhiqiang Rao, Jinlong Yang, Hao Yang 0006. 20824-20835 [doi]
- SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offsYounghun Lee, Dan Goldwasser. 20836-20851 [doi]
- LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language ModelsKang He, Kaushik Roy. 20852-20881 [doi]
- Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous GraphsMichiharu Yamashita, Thanh Tran 0005, Delvin Ce Zhang, Dongwon Lee 0001. 20882-20897 [doi]
- GAP: a Global Adaptive Pruning Method for Large Language ModelsZhihua Ban, Haotian Ma, Siheng Zhang, Shengyu Liu, Xichen Chen, Ming Yang. 20898-20903 [doi]
- Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can ProduceHaojin Wang, Zining Zhu 0005, Freda Shi. 20904-20917 [doi]
- LGA: LLM-GNN Aggregation for Temporal Evolution Attribute Graph PredictionFeng Zhao, Ruoyu Chai, Kangzheng Liu, Xianggan Liu. 20918-20929 [doi]
- EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language ModelsTao Zou, Xinghua Zhang 0001, Haiyang Yu 0003, Minzheng Wang 0002, Fei Huang 0002, Yongbin Li. 20930-20953 [doi]
- Tool Preferences in Agentic LLMs are UnreliableKazem Faghih, Wenxiao Wang 0002, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi. 20954-20969 [doi]
- Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-TuningYu Liu, Yanan Cao, Xixun Lin, Yanmin Shang, Shi Wang 0002, Shirui Pan. 20970-20984 [doi]
- MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial DocumentsJoongmin Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo, HeuiSeok Lim. 20985-21004 [doi]
- Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language ModelsQiang Liu 0006, Xinlong Chen, Yue Ding 0009, Bowen Song, Weiqiang Wang, Shu Wu, Liang Wang 0001. 21005-21021 [doi]
- 'Rich Dad, Poor Lad': How do Large Language Models Contextualize Socioeconomic Factors in College Admission ?Huy Nghiem, Phuong-Anh Nguyen-Le, John Prindle, Rachel Rudinger, Hal Daumé III. 21022-21056 [doi]
- Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision BoundaryLicheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, Jun Zhou 0011, Zhixuan Chu. 21057-21075 [doi]
- MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute PromptingXinpan Yuan, Mingzhu Huang, Liujie Hua, Jianuo Ju, Xu Zhang. 21076-21086 [doi]
- The Emperor's New Reasoning: Format Imitation Overshadows Genuine Mathematical Understanding in SFTLinyao Yang, Jian-Tao Huang, Yafei Lu, Zhenhui Jessie Li, Guirong Xue. 21087-21100 [doi]
- Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step ReasoningLang Cao, Yingtian Zou, Chao Peng, Renhong Chen, Wu Ning, Yitong Li. 21101-21118 [doi]
- Flexibly Utilize Memory for Long-Term Conversation via a Fragment-then-Compose FrameworkCai Ke, Yiming Du, Bin Liang 0004, Yifan Xiang, Lin Gui 0003, Zhongyang Li, Baojun Wang, Yue Yu 0001, Hui Wang 0030, Kam-Fai Wong, Ruifeng Xu 0001. 21119-21136 [doi]
- STRICT: Stress-Test of Rendering Image Containing TextTianyu Zhang, Xinyu Wang, Lu Li, Zhenghan Tai, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang. 21137-21150 [doi]
- A Sequential Multi-Stage Approach for Code Vulnerability Detection via Confidence- and Collaboration-based Decision MakingChung-Nan Tsai, Xin Wang, Cheng-Hsiung Lee, Ching-Sheng Lin. 21151-21157 [doi]
- Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement CreativityZhaoyi Joey Hou, Adriana Kovashka, Xiang Lorraine Li. 21158-21177 [doi]
- BIRD: Bronze Inscription Restoration and DatingWenjie Hua, Hoang H. Nguyen, Gangyan Ge. 21178-21190 [doi]
- DCP: Dual-Cue Pruning for Efficient Large Vision-Language ModelsLei Jiang, Zixun Zhang, Yuting Zeng, Chunzhao Xie, Tongxuan Liu, Zhen Li, Lechao Cheng, XiaoHua Xu. 21191-21204 [doi]
- Improving Context Fidelity via Native Retrieval-Augmented ReasoningSuyuchen Wang, Jinlin Wang, Xinyu Wang, Shiqi Li, Xiangru Tang, Sirui Hong, Xiao-Wen Chang, Chenglin Wu, Bang Liu 0003. 21205-21218 [doi]
- Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free GuidanceShehzeen Samarah Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Roy Fejgin, Mikyas T. Desta, Rafael Valle, Jason Li. 21219-21234 [doi]
- Mixing Inference-time Experts for Enhancing LLM ReasoningSoumya Sanyal 0001, Tianyi Xiao, Xiang Ren 0001. 21235-21249 [doi]
- Reinforced Query Reasoners for Reasoning-intensive Retrieval TasksXubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, Zilong Zheng. 21250-21263 [doi]
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache SelectionWei Wu 0045, Zhuoshi Pan, Kun Fu 0002, Chao Wang 0086, Liyi Chen 0001, Yunchu Bai, Tianfu Wang 0002, Zheng Wang 0027, Hui Xiong 0001. 21264-21281 [doi]
- MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language ModelsSiyu Yan, Long Zeng 0004, Xuecheng Wu, Chengcheng Han 0004, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo. 21282-21303 [doi]
- EnAnchored-X2X: English-Anchored Optimization for Many-to-Many TranslationSen Yang, Yu Bao, Yu Lu, Jiajun Chen 0001, Shujian Huang, Shanbo Cheng. 21304-21317 [doi]
- "I've Decided to Leak": Probing Internals Behind Prompt Leakage IntentsJianshuo Dong, Yutong Zhang, Liu Yan, Zhenyu Zhong, Tao Wei 0002, Ke Xu 0002, Minlie Huang, Chao Zhang 0008, Han Qiu 0001. 21318-21348 [doi]
- Nullspace Disentanglement for Red Teaming Language ModelsYi Han, Yuanxing Liu 0001, Weinan Zhang 0003, Ting Liu 0001. 21349-21365 [doi]
- Supervised Attention Mechanism for Low-quality Multimodal DataSijie Mai, Shiqin Han, Haifeng Hu. 21366-21386 [doi]
- Reinforcement Learning for Large Language Models via Group Preference Reward ShapingHuaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Zhimeng Guo, Shijie Zhou 0008, Shuyue Hu, Vasant G. Honavar. 21387-21400 [doi]
- zFLoRA: Zero-Latency Fused Low-Rank AdaptersDhananjaya Gowda, Seoha Song, Harshith Goka, Junhyun Lee. 21401-21418 [doi]
- PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem SolvingMihir Parmar, Palash Goyal, Xin Liu, Yiwen Song, Mingyang Ling, Chitta Baral, Hamid Palangi, Tomas Pfister. 21419-21433 [doi]
- Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language ModelsJinsung Kim, Seonmin Koo, HeuiSeok Lim. 21434-21471 [doi]
- AMACE: Automatic Multi-Agent Chart Evolution for Iteratively Tailored Chart GenerationHyuk Namgoong, Jeesu Jung, Hyeonseok Kang, Yohan Lee, Sangkeun Jung. 21472-21487 [doi]
- ActionStudio: A Lightweight Framework for Data and Training of Large Action ModelsJianguo Zhang 0006, Thai-Hoang, Ming Zhu, Zuxin Liu, Shiyu Wang, Tulika Awalgaonkar, Akshara Prabhakar, Haolin Chen, Weiran Yao, Zhiwei Liu 0001, Juntao Tan, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong. 21488-21502 [doi]
- Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM SafetySeongmin Lee 0007, Aeree Cho, Grace C. Kim, Shengyun Peng, Mansi Phute, Duen Horng Chau. 21503-21534 [doi]
- Unveiling the Response of Large Vision-Language Models to Visually Absent TokensSohee Kim, Soohyun Ryu, Joonhyung Park, Eunho Yang. 21535-21557 [doi]
- Improving Task Diversity in Label Efficient Supervised Finetuning of LLMsAbhinav Arabelly, Jagrut Nemade, Robert D. Nowak, Jifan Zhang. 21558-21570 [doi]
- Look Beyond Feeling: Unveiling Latent Needs from Implicit Expressions for Proactive Emotional SupportXing Fu, Haozhen Li, Bichen Wang, Hao Yang 0066, Yanyan Zhao, Bing Qin 0001. 21571-21598 [doi]
- s3: You Don't Need That Much Data to Train a Search Agent via RLPengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang 0008, Jimeng Sun 0001, Jiawei Han 0001. 21599-21617 [doi]
- FuseChat: Knowledge Fusion of Chat ModelsFanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen 0001, Xiaojun Quan. 21618-21642 [doi]
- Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence TransformersYukun Zhang, Xueqing Zhou. 21643-21663 [doi]
- Forget What You Know about LLMs Evaluations - LLMs are Like a ChameleonNurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, Seffi Cohen. 21664-21677 [doi]
- Memorization or Reasoning? Exploring the Idiom Understanding of LLMsJisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, Taeuk Kim. 21678-21699 [doi]
- RD-MCSA: A Multi-Class Sentiment Analysis Approach Integrating In-Context Classification Rationales and DemonstrationsHaihua Xie, Yinzhu Cheng, Yaqing Wang, Miao He, Mingming Sun. 21700-21723 [doi]
- Puzzled by Puzzles: When Vision-Language Models Can't Take a HintHeekyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan. 21724-21737 [doi]
- CREPE: Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error CountsGihun Cho, Seunghyun Jang, Hanbin Ko, Inhyeok Baek, Chang Min Park. 21738-21755 [doi]
- TIDES: Technical Information Discovery and Extraction SystemJihee Kim, Subeen Park, Hakyung Lee, Yongtaek Lim, Hyo-Won Suh, Kyungwoo Song. 21756-21772 [doi]
- Learning to Ask: When LLM Agents Meet Unclear InstructionWenxuan Wang 0001, Juluan Shi, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang 0001, Wenxiang Jiao, Michael R. Lyu. 21773-21784 [doi]
- RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual ReconstructionYuChi Wang, Yishuo Cai, Shuhuai Ren, Sihan Yang, Linli Yao, Yuanxin Liu, Yuanxing Zhang, Pengfei Wan 0001, Xu Sun 0001. 21785-21804 [doi]
- StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy OptimizationXuhui Zheng, Kang An, Ziliang Wang, Yuhang Wang, Yichao Wu. 21805-21830 [doi]
- Dynamic Model-Bank Test-Time Adaptation for Automatic Speech RecognitionYanshuo Wang, Yanghao Zhou, Yukang Lin, Haoxing Chen, Jin Zhang, Wentao Zhu, Jie Hong, Xuesong Li. 21831-21841 [doi]
- Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware PruningWei Huang 0039, Anda Cheng, Yinggui Wang. 21842-21856 [doi]
- Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language ModelsHwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim. 21857-21869 [doi]
- ArgCMV: An Argument Summarization Benchmark for the LLM-eraOmkar Gurjar, Agam Goyal, Eshwar Chandrasekharan. 21870-21883 [doi]
- VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for MinecraftHonghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, Hao Wang. 21884-21898 [doi]
- GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache EvictionXuelin Li, Xiangqi Jin, Linfeng Zhang. 21899-21909 [doi]
- Joint Modeling of Entities and Discourse Relations for Coherence AssessmentWei Liu, Michael Strube 0001. 21910-21926 [doi]
- Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMsJun Bai, Minghao Tong, Yang Liu, Zixia Jia, Zilong Zheng. 21927-21942 [doi]
- HMoE: Heterogeneous Mixture of Experts for Language Modelingan Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu 0004, Zhen Yang, Pinxue Zhao, Weidong Han 0006, Zhanhui Kang, Di Wang 0052, Naoaki Okazaki, Cheng-Zhong Xu 0001. 21943-21957 [doi]
- The Ranking Blind Spot: Decision Hijacking in LLM-based Text RankingYaoyao Qian, Yifan Zeng, Yuchao Jiang, Chelsi Jain, Huazheng Wang. 21958-21968 [doi]
- Uniform Information Density and Syntactic Reduction: Revisiting *that*-Mentioning in English Complement ClausesHailin Hao, Elsi Kaiser. 21969-21983 [doi]
- GRIT: Guided Relational Integration for Efficient Multi-Table UnderstandingYujin Kang, Park Seong Woo, Yoon-Sik Cho. 21984-21997 [doi]
- RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question AnsweringYiming Zhang, Siyue Zhang, Junbo Zhao, Chen Zhao. 21998-22012 [doi]
- Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question AnsweringLorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, Jordan Lee Boyd-Graber. 22013-22054 [doi]
- Data-Efficient Selection via Grammatical Complexity in Continual Pre-training of Domain-Specific LLMsYizhou Ying, Geng Zhang, Cui Danxin, Chengyu Du, Guanglei Yue, Sihang Jiang, Jiaqing Liang, Yifei Fu, Hailin Hu, Yanghua Xiao. 22055-22069 [doi]
- Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis ModelsGuangyu Xie, Yice Zhang, Jianzhu Bao, Qianlong Wang 0001, Yang Sun, Bingbing Wang, Ruifeng Xu 0001. 22070-22091 [doi]
- One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented DialoguesHuy Quang Dao, Lizi Liao. 22092-22116 [doi]
- Unsupervised Hallucination Detection by Inspecting Reasoning ProcessesPonhvoan Srey, Xiaobao Wu, Anh Tuan Luu. 22117-22129 [doi]
- Multimodal Neural Machine Translation: A Survey of the State of the ArtYi Feng, Chuanyi Li, Jiatong He, Zhenyu Hou, Vincent Ng. 22130-22147 [doi]
- Lemmatization of Polish Multi-word ExpressionsMagdalena Król, Aleksander Smywinski-Pohl, Zbigniew Kaleta, Pawel Lewkowicz. 22148-22157 [doi]
- Targeted Distillation for Sentiment AnalysisYice Zhang, Guangyu Xie, Jingjie Lin, Jianzhu Bao, Qianlong Wang 0001, Xi Zeng, Ruifeng Xu 0001. 22158-22181 [doi]
- DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM JailbreakHao Wang 0003, Hao Li 0031, Junda Zhu 0003, Xinyuan Wang 0009, Chengwei Pan, Minlie Huang, Lei Sha. 22182-22194 [doi]
- Rank-Awareness and Angular Constraints: A New Perspective on Learning Sentence Embeddings from NLI DataZicheng Zhou, Min Huang, Qinghai Miao. 22195-22209 [doi]
- LLM-Guided Semantic Relational Reasoning for Multimodal Intent RecognitionQianrui Zhou, Hua Xu, Yifan Wang, Xinzhi Dong, Hanlei Zhang. 22210-22226 [doi]
- Seeing Culture: A Benchmark for Visual Reasoning and GroundingBurak Satar, Zhixin Ma 0001, Patrick Amadeus Irawan, Wilfried A. Mulyawan, Jing Jiang 0001, Ee-Peng Lim, Chong-Wah Ngo. 22227-22243 [doi]
- GRADA: Graph-based Reranking against Adversarial Documents AttackJingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu. 22244-22266 [doi]
- Orchestrating Audio: Multi-Agent Framework for Long-Video Audio SynthesisYehang Zhang, Xinli Xu, Xiaojie Xu, Doudou Zhang, Li Liu, Ying-Cong Chen. 22267-22282 [doi]
- MADAWSD: Multi-Agent Debate Framework for Adversarial Word Sense DisambiguationKaiyuan Zhang, Qian Liu 0012, Luyang Zhang, Chaoqun Zheng, Shuaimin Li, Bing Xu, Muyun Yang, Xinxiao Qiao, Wenpeng Lu. 22283-22302 [doi]
- Interpretable Text Embeddings and Text Similarity Explanation: A SurveyJuri Opitz, Lucas Möller, Andrianos Michail, Sebastian Padó, Simon Clematide. 22303-22319 [doi]
- Dyve: Thinking Fast and Slow for Dynamic Process VerificationJianyuan Zhong, Zeju Li, Zhijian Xu, Xiangyu Wen 0001, Qiang Xu 0001. 22320-22333 [doi]
- PERSEVAL: A Framework for Perspectivist Classification EvaluationSoda Marem Lo, Silvia Casola, Erhan Sezerer, Valerio Basile, Franco Sansonetti, Antonio Uva 0001, Davide Bernardi. 22334-22359 [doi]
- Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment QualityYuto Harada, Yusuke Yamauchi, Yusuke Oda, Yohei Oseki, Yusuke Miyao, Yu Takagi. 22360-22381 [doi]
- IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian LanguagesUjjwal Sharma 0004, Pushpak Bhattacharyya. 22382-22396 [doi]
- Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product RecommendationsGiorgos Filandrianos, Angeliki Dimitriou, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou. 22397-22426 [doi]
- T2R-BENCH: A Benchmark for Real World Table-to-Report TaskJie Zhang, Changzai Pan, Sishi Xiong, Kaiwen Wei, Yu Zhao, Xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Xuelong Li. 22427-22451 [doi]
- TCP: a Benchmark for Temporal Constraint-Based PlanningZifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, Andreas Vlachos 0001. 22452-22475 [doi]
- The Role of Outgoing Connection Heterogeneity in Feedforward Layers of Large Language ModelsFelix Stahlberg, Shankar Kumar. 22476-22484 [doi]
- Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic AgentsManan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha. 22485-22508 [doi]
- Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn DialogLautaro Estienne, Gabriel Ben Zenou, Nona Naderi, Jackie CK Cheung, Pablo Piantanida. 22509-22523 [doi]
- Understanding Subword Compositionality of Large Language ModelsQiwei Peng 0003, Yekun Chai, Anders Søgaard. 22524-22535 [doi]
- Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMsZhipeng Yang, Junzhuo Li, Siyu Xia, Xuming Hu. 22536-22564 [doi]
- From Understanding to Generation: An Efficient Shortcut for Evaluating Language ModelsViktor Hangya, Fabian Küch, Darina Gold. 22565-22581 [doi]
- Debiasing Multilingual LLMs in Cross-lingual Latent SpaceQiwei Peng 0003, Guimin Hu, Yekun Chai, Anders Søgaard. 22582-22593 [doi]
- Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document EmbeddingsMax Conti, Manuel Faysse, Gautier Viaud, Antoine Bosselut, Céline Hudelot, Pierre Colombo. 22594-22608 [doi]
- MS-RAG: Simple and Effective Multi-Semantic Retrieval-Augmented GenerationXiaozhou You, Yahui Luo, Lihong Gu. 22609-22625 [doi]
- Transitive self-consistency evaluation of NLI models without gold labelsWei Wu, Mark Last. 22626-22642 [doi]
- MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language QueriesJonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim 0001, Jungseul Ok, Gary Lee 0001. 22643-22659 [doi]
- Enhancing Chinese Offensive Language Detection with Homophonic PerturbationJunqi Wu, Shujie Ji, Kang Zhong, Huiling Peng, Zhendongxiao, Xiongding Liu, Wu Wei. 22660-22675 [doi]
- Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing StylesKimberly Le Truong, Riccardo Fogliato, Hoda Heidari, Steven Wu 0001. 22676-22709 [doi]
- Computational Analysis of Character Development in Holocaust TestimoniesEsther Shizgal, Eitan Wagner, Renana Keydar, Omri Abend. 22710-22734 [doi]
- TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model AdaptationDaiye Miao, Yufang Liu, Jie Wang, Changzhi Sun, Yunke Zhang, Demei Yan, Shaokang Dong, Qi Zhang, Yuanbin Wu. 22735-22747 [doi]
- Dual-Path Counterfactual Integration for Multimodal Aspect-Based Sentiment ClassificationRui Liu, Jiahao Cao 0002, Jiaqian Ren, Xu Bai, Yanan Cao. 22748-22758 [doi]
- Job Unfair: An Investigation of Gender and Occupational Bias in Free-Form Text Completions by LLMsCamilla Casula, Sebastiano Vecellio Salto, Elisa Leonardelli, Sara Tonelli. 22759-22777 [doi]
- C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex ConversationsChengqian Ma, Wei Tao, Steven Y. Guo. 22778-22796 [doi]
- Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes FromChangjiang Gao, Hankun Lin, Xin Huang, Xue Han 0018, Junlan Feng, Chao Deng, Jiajun Chen 0001, Shujian Huang. 22797-22826 [doi]
- Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark DatasetsMahdi Zakizadeh, Mohammad Taher Pilehvar. 22827-22840 [doi]
- Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language ModelsSergio E. Zanotto, Segun Aroyehun. 22841-22858 [doi]
- An Interdisciplinary Approach to Human-Centered Machine TranslationMarine Carpuat, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao 0001, Alvin Grissom II, Marzena Karpinska, Elaine C. Khoong, William D. Lewis, André F. T. Martins, Mary Nurminen, Douglas W. Oard, Maja Popovic, Michel Simard, François Yvon. 22859-22879 [doi]
- Exploring the Hidden Capacity of LLMs for One-Step Text GenerationGleb Mezentsev, Ivan V. Oseledets. 22880-22889 [doi]
- Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV OptimizationGuanghui Song, Dongping Liao, Yiren Zhao, Kejiang Ye, Chengzhong Xu 0001, Xitong Gao. 22890-22903 [doi]
- PathwiseRAG: Multi-Dimensional Exploration and Integration FrameworkHengrui Zhang, Pin-Siang Huang, Zhen Zhang, Peican Lin, Yao-Ching Yu, Bo Hu, Yulu Du. 22904-22925 [doi]
- "Mm, Wat?" Detecting Other-initiated Repair Requests in DialogueAnh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloé Clavel. 22926-22939 [doi]
- R-BPE: Improving BPE-Tokenizers with Token ReuseNancy Hamdan, Osama Rakan Al Mraikhat, Fadi A. Zaraket. 22940-22948 [doi]
- Language Models Can be Efficiently Steered via Minimal Embedding Layer TransformationsDiogo Tavares, David Semedo, Alexander Rudnicky, João Magalhães. 22949-22967 [doi]
- Adversarial Attacks Against Automated Fact-Checking: A SurveyFanzhen Liu, Sharif Abuadbba, Kristen Moore, Surya Nepal, Cécile Paris, Jia Wu 0001, Jian Yang 0003, Quan Z. Sheng. 22968-22990 [doi]
- WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?An-Lan Wang, Jingqun Tang, Lei Liao, Hao Feng 0009, Qi Liu, Xiang Fei, Jinghui Lu, Han Wang, Hao Liu 0003, Yuliang Liu, Xiang Bai, Can Huang 0002. 22991-23001 [doi]
- DCR: Quantifying Data Contamination in LLMs EvaluationCheng Xu 0006, Nan Yan, Shuhao Guan, Changhong Jin, Yuke Mei, Yibing Guo, M. Tahar Kechadi. 23002-23020 [doi]
- Building Trust in Clinical LLMs: Bias Analysis and Dataset TransparencySvetlana Maslenkova, Clement Christophe, Marco AF Pimentel, Tathagata Raha, Muhammad Umar Salman, Ahmed Al Mahrooqi, Avani Gupta, Shadab Khan, Ronnie Rajan, Praveen K. Kanithi. 23021-23044 [doi]
- Surprise Calibration for Better In-Context LearningZhihang Tan, Jingrui Hou, Ping Wang 0028, Qibiao Hu, Peng Zhu. 23045-23060 [doi]
- SPARK: Simulating the Co-evolution of Stance and Topic Dynamics in Online Discourse with LLM-based AgentsBowen Zhang 0005, Yi Yang, Fuqiang Niu, Xianghua Fu, Genan Dai, Hu Huang 0009. 23061-23073 [doi]
- Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthYang Wang, Chenghao Xiao, Chia-Yi Hsiao, Zi Yan Chang, Chi-Li Chen, Tyler Loakman, Chenghua Lin. 23074-23096 [doi]
- Can Large Language Models be Effective Online Opinion Miners?Ryang Heo, Yongsik Seo, Junseong Lee, Dongha Lee 0003. 23097-23136 [doi]
- Can Large Language Models Translate Unseen Languages in Underrepresented Scripts?Dianqing Lin, Aruukhan, Hongxu Hou, Shuo Sun, Wei Chen, Yichen Yang, Guodong Shi. 23137-23150 [doi]
- InterIDEAS: Philosophical Intertextuality via LLMsYue Yang, Yinzhi Xu, Chenghao Huang, JohnMichael Jurgensen, Han Hu, Hao Wang. 23151-23172 [doi]
- KCS: Diversify Multi-hop Question Generation with Knowledge Composition SamplingYangfan Wang, Jie Liu, Chen Tang, Lian Yan, Jingchi Jiang. 23173-23185 [doi]
- Fooling the LVLM Judges: Visual Biases in LVLM-Based EvaluationYerin Hwang, Dongryeol Lee, Kyungmin Min, Taegwan Kang, Yongil Kim, Kyomin Jung. 23186-23205 [doi]
- Disentangled Information Bottleneck for Adversarial Text DefenseYidan Xu, Xinghao Yang, Wei Liu, Bao-Di Liu, Weifeng Liu. 23206-23218 [doi]
- How do Language Models Reshape Entity Alignment? A Survey of LM-Driven EA Methods: Advances, Benchmarks, and FutureZerui Chen, Huiming Fan, Qianyu Wang, Tao He 0014, Ming Liu 0004, Heng Chang, Weijiang Yu, Ze Li, Bing Qin 0001. 23219-23234 [doi]
- Enhancing LLM-Based Social Bot via an Adversarial Learning FrameworkFanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang 0001, Song Chun Zhu, Xue Feng. 23235-23260 [doi]
- GER-LLM: Efficient and Effective Geospatial Entity Resolution with Large Language ModelHaojia Zhu, Zhicheng Li, Jiahui Jin. 23261-23277 [doi]
- CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code CompletionSheng Zhang, Yifan Ding, Shuquan Lian, Shun Song, Hui Li. 23278-23288 [doi]
- Searching for the Most Human-like Emergent LanguageBrendon Boldt, David R. Mortensen. 23289-23307 [doi]
- Does Context Matter? A Prosodic Comparison of English and Spanish in Monolingual and Multilingual Discourse SettingsDebasmita Bhattacharya, David Sasu, Michela Marchini, Natalie Schluter, Julia Hirschberg. 23308-23322 [doi]
- ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based OptimizationSeungyoun Yi, Minsoo Khang, Sungrae Park. 23323-23337 [doi]
- Toward Machine Interpreting: Lessons from Human Interpreting StudiesMatthias Sperber, Maureen de Seyssel, Jiajun Bao, Matthias Paulik. 23338-23353 [doi]
- FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure GamesJaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, Gunhee Kim. 23354-23384 [doi]
- FLARE: Faithful Logic-Aided Reasoning and ExplorationErik Arakelyan, Pasquale Minervini, Patrick S. H. Lewis, Pat Verga, Isabelle Augenstein. 23385-23403 [doi]
- Discourse-Driven Code-Switching: Analyzing the Role of Content and Communicative Function in Spanish-English Bilingual SpeechDebasmita Bhattacharya, Juan Junco, Divya Tadimeti, Julia Hirschberg. 23404-23419 [doi]
- Can Large Language Models Translate Spoken-Only Languages through International Phonetic Transcription?Jiale Chen, Xuelian Dong, Qihao Yang, Wenxiu Xie, Tianyong Hao. 23420-23435 [doi]
- ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific ChartsRuiran Su, Jiasheng Si, Zhijiang Guo, Janet B. Pierrehumbert. 23436-23458 [doi]
- Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware AlignmentHyuntae Park, Yeachan Kim, SangKeun Lee. 23459-23479 [doi]
- SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data ConstraintsVictor Adelakun Omolaoye, Babajide Alamu Owoyele, Gerard de Melo. 23480-23495 [doi]
- What You See is What You Ask: Evaluating Audio DescriptionsDivy Kala, Eshika Khandelwal, Makarand Tapaswi. 23496-23518 [doi]
- TAPS: Tool-Augmented Personalisation via Structured TaggingEkaterina Taktasheva, Jeff Dalton 0002. 23519-23544 [doi]
- Investigating How Pre-training Data Leakage Affects Models' Reproduction and Detection CapabilitiesMasahiro Kaneko, Timothy Baldwin. 23545-23555 [doi]
- Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token PruningWenda Qin, Andrea Burns, Bryan A. Plummer, Margrit Betke. 23556-23570 [doi]
- Connecting the Knowledge Dots: Retrieval-augmented Knowledge Connection for Commonsense ReasoningJunho Kim, Soyeon Bak, Mingyu Lee, Minju Hong, Songha Kim, Tae-Eui Kam, SangKeun Lee. 23571-23590 [doi]
- Agent-as-Judge for Factual Summarization of Long NarrativesYeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim. 23591-23608 [doi]
- DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text GenerationMiriam Wanner, Benjamin Van Durme, Mark Dredze. 23609-23626 [doi]
- RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMsAlberto Testoni, Barbara Plank, Raquel Fernández. 23627-23647 [doi]
- Resource-Rational Noisy-Channel Language Processing: Testing the Effect of Algorithmic Constraints on InferencesThomas Hikaru Clark, Jacob Hoover Vigly, Edward Gibson, Roger P. Levy. 23648-23661 [doi]
- In Benchmarks We Trust ... Or Not?Ine Gevers, Victor De Marez, Jens Van Nooten, Jens Lemmens, Andriy Kosar, Ehsan Lotfi 0002, Nikolay Banar, Pieter Fivez, Luna De Bruyne, Walter Daelemans. 23662-23676 [doi]
- Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing AgentsXueqiao Zhang, Chao Zhang, Jingtao Xu, Yifan Zhu, Xin Shi, Yi Yang 0001, Yawei Luo. 23677-23703 [doi]
- Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX TasksMaureen de Seyssel, Jie Chi, Skyler Seto, Maartje ter Hoeve, Masha Fedzechkina, Natalie Schluter. 23704-23725 [doi]
- Rethinking Text-based Protein Understanding: Retrieval or LLM?Juntong Wu, Zijing Liu, He Cao, Li Hao, Bin Feng, Zishan Shu, Ke Yu, Li Yuan, Yu Li 0006. 23726-23746 [doi]
- Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot CommandsClaudiu Daniel Hromei, Antonio Scaiella, Danilo Croce, Roberto Basili 0001. 23747-23770 [doi]
- Easy as PIE? Identifying Multi-Word Expressions with LLMsKai Golan Hashiloni, Ofri Hefetz, Kfir Bar. 23771-23790 [doi]
- Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-rankingWuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen 0001, Xi Ye 0003. 23791-23805 [doi]
- Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme DetectionJingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne. 23806-23828 [doi]
- Audio-Reasoner: Improving Reasoning Capability in Large Audio Language ModelsZhifei Xie, Mingbao Lin, Zihang Liu, Pengcheng Wu, Shuicheng Yan, Chunyan Miao. 23829-23851 [doi]
- From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation modelMarvin Lavechin, Thomas Hueber. 23852-23863 [doi]
- REALM: Recursive Relevance Modeling for LLM-based Document Re-RankingPinhuan Wang, Zhiqiu Xia, Chunhua Liao, Feiyi Wang, Hang Liu. 23864-23878 [doi]
- PLLuM-Align: Polish Preference Dataset for Large Language Model AlignmentKarolina Seweryn, Anna Kolos, Agnieszka Karlinska, Katarzyna Lorenc, Katarzyna Dziewulska, Maciej Chrabaszcz, Aleksandra Krasnodebska, Paula Betscher, Zofia Cieslinska, Katarzyna Kowol, Julia Moska, Dawid Motyka, Pawel Walkowiak, Bartosz Zuk, Arkadiusz Janz. 23879-23908 [doi]
- Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit ReasoningYicong Wu, Guangyue Lu, Yuan Zuo, Huarong Zhang, Junjie Wu. 23909-23927 [doi]
- Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM CollaborationWeicheng Ma, John J. Guerrerio, Soroush Vosoughi. 23928-23956 [doi]
- Can Large Language Models Be Good Language Teachers?Liqing Xu, Qiwei Li 0002, Tianshuo Peng, Zuchao Li, Hai Zhao 0001, Ping Wang 0028. 23957-23971 [doi]
- Empowering Math Problem Generation and Reasoning for Large Language Model via Synthetic Data based Continual Learning FrameworkQian Wan 0007, Wangzi Shi, Jintian Feng, Shengyingjie Liu, Luona Wei, Zhicheng Dai, Jianwen Sun. 23972-23991 [doi]
- Tokenization and Representation Biases in Multilingual Models on Dialectal NLP TasksVani Kanjirangat, Tanja Samardzic, Ljiljana Dolamic, Fabio Rinaldi 0001. 23992-24010 [doi]
- Evaluating the Evaluators: Are readability metrics good measures of readability?Isabel Cachola, Daniel Khashabi, Mark Dredze. 24011-24027 [doi]
- Text Takes Over: A Study of Modality Bias in Multimodal Intent DetectionAnkan Mullick, Saransh Sharma, Abhik Jana, Pawan Goyal 0002. 24028-24058 [doi]
- What's in a prompt? Language models encode literary style in prompt embeddingsRaphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls. 24059-24068 [doi]
- Identifying and Answering Questions with False Assumptions: An Interpretable ApproachZijie Wang, Eduardo Blanco. 24069-24087 [doi]
- VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial UnderstandingZhaowei Liu, Xin Guo, Haotian Xia, Lingfeng Zeng, Fangqi Lou, Jinyi Niu, Mengping Li, Qi Qi, Jiahuan Li, Wei Zhang, Yinglong Wang, Weige Cai, Weining Shen, Liwen Zhang. 24088-24146 [doi]
- Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right QuestionsDavid Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim 0002, Amlan Kar, Sanja Fidler, Yejin Choi 0001. 24147-24160 [doi]
- LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual ExplanationsHarry Mayne, Ryan Othniel Kearns, Yushi Yang, Andrew M. Bean, Eoin D. Delaney, Chris Russell 0001, Adam Mahdi. 24161-24186 [doi]
- Grounding Multilingual Multimodal LLMs With Cultural KnowledgeJean de Dieu Nyandwi, Yueqi Song, Simran Khanuja, Graham Neubig. 24187-24231 [doi]
- Following Length Constraints in InstructionsWeizhe Yuan, Ilia Kulikov, Ping Yu, KyungHyun Cho, Sainbayar Sukhbaatar, Jason E. Weston, Jing Xu 0014. 24232-24243 [doi]
- Memory-QA: Answering Recall Questions Based on Multimodal MemoriesHongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiunzu Kuo, Jiayang Xu, Aaron Colak, Xin Luna Dong. 24244-24266 [doi]
- NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM JailbreaksJavad Rafiei-Asl, Sidhant Narula, Mohammad GhasemiGol, Eduardo Blanco 0002, Daniel Takabi. 24267-24295 [doi]
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired SketchingSimon A. Aytes, Jinheon Baek, Sung Ju Hwang. 24296-24320 [doi]
- From Language to Cognition: How LLMs Outgrow the Human Language NetworkBadr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Osama A Binhuraib, Antoine Bosselut, Martin Schrimpf. 24321-24339 [doi]
- Logos as a Well-Tempered Pre-train for Sign Language RecognitionIlya Ovodov, Petr Surovtsev, Karina Kvanchiani, Alexander Kapitanov, Alexander Nagaev. 24340-24353 [doi]
- Hallucination Detection in LLMs Using Spectral Features of Attention MapsJakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys, Tomasz Kajdanowicz. 24354-24385 [doi]
- Composable Cross-prompt Essay Scoring by Merging ModelsSanwoo Lee, Kun Liang, Yunfang Wu. 24386-24400 [doi]
- Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length ContextsYuho Lee, Jiaqi Deng, Nicole Hee-Yeon Kim, Hyangsuk Min, Taewon Yun, Minjeong Ban, Kim Yul, Hwanjun Song. 24401-24425 [doi]
- Improving Large Language Models Function Calling and Interpretability via Guided-Structured TemplatesHy Dang, Tianyi Liu, Zhuofeng Wu 0005, Jingfeng Yang 0001, Haoming Jiang, Tao Yang, Pei Chen, Zhengyang Wang, Helen Wang, Huasheng Li, Bing Yin, Meng Jiang 0001. 24426-24442 [doi]
- Evaluation and Facilitation of Online Discussions in the LLM Era: A SurveyKaterina Korre, Dimitris Tsirmpas, Nikos Gkoumas, Emma Cabalé, Danai Myrtzani, Theodoros Evgeniou, Ion Androutsopoulos, John Pavlopoulos. 24443-24462 [doi]
- Temporal Scaling Law for Large Language ModelsYizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen 0013, Zijia Lin, Haoran Lian, Zhenpeng Su, Wei Huang, Jianwei Niu 0002, Jungong Han, Guiguang Ding. 24463-24483 [doi]
- Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language ModelsYi Feng, Jiaqi Wang 0006, Wenxuan Zhang, Zhuang Chen 0002, Yutong Shen, Xiyao Xiao, Minlie Huang, Liping Jing, Jian Yu 0001. 24484-24509 [doi]
- From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association TestXunlian Dai, Li Zhou 0010, Benyou Wang, Haizhou Li 0001. 24510-24526 [doi]
- Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic DataShenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren 0019, Tianqi Zheng, Hanqing Lu, Han Xu 0002, Hui Liu 0003, Yue Xing 0002, Jiliang Tang. 24527-24558 [doi]
- AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak DefenderWeixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng 0002, An Zhang 0003, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin 0001, Tat-Seng Chua, Ting Liu 0001. 24559-24577 [doi]
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and OpportunitiesChuangtao Ma, Yongrui Chen 0002, Tianxing Wu 0001, Arijit Khan 0001, Haofen Wang. 24578-24597 [doi]
- TFDP: Token-Efficient Disparity Audits for Autoregressive LLMs via Single-Token Masked EvaluationInderjeet Singh 0001, Ramya Srinivasan, Roman Vainshtein, Hisashi Kojima. 24598-24615 [doi]
- Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and TranscreationLi Zhou 0010, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li 0001, Haizhou Li 0001. 24616-24638 [doi]
- MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion RecognitionZhongyu Yang, Junhao Song, Siyang Song, Wei Pang, Yingfang Yuan. 24639-24655 [doi]
- Personality Vector: Modulating Personality of Large Language Models by Model MergingSeungjong Sun, Seo Yeon Baek, Jang-Hyun Kim. 24656-24677 [doi]
- Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language ModelsRuibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Mingchen Zhuge, Jürgen Schmidhuber. 24678-24714 [doi]
- Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMsQianqi Yan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang. 24715-24735 [doi]
- PrimeX: A Dataset of Worldview, Opinion, and ExplanationRik Koncel-Kedziorski, Brihi Joshi, Tim Paek. 24736-24761 [doi]
- LASER: An LLM-based ASR Scoring and Evaluation RubricAmruta Parulekar, Preethi Jyothi. 24762-24771 [doi]
- Improving Zero-shot Sentence Decontextualisation with Content Selection and PlanningZhenyun Deng, Yulong Chen 0001, Andreas Vlachos 0001. 24772-24788 [doi]
- Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented GenerationJiankun Zhang, Shenglai Zeng, Jie Ren 0019, Tianqi Zheng, Hui Liu 0031, Xianfeng Tang, Hui Liu 0031, Yi Chang 0001. 24789-24810 [doi]
- Code Execution as Grounded Supervision for LLM ReasoningDongwon Jung, Wenxuan Zhou 0002, Muhao Chen 0001. 24811-24822 [doi]
- Subjective Behaviors and Preferences in LLM: Language of BrowsingSai Sundaresan, Harshita Chopra, Atanu R. Sinha, Koustava Goswami, Nagasai Saketh Naidu, Raghav Karan, N. Anushka. 24823-24836 [doi]
- Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual CounterfactsMichal Golovanevsky, William Rudman, Michael A. Lepori, Amir Bar, Ritambhara Singh, Carsten Eickhoff. 24837-24852 [doi]
- Balcony: A Lightweight Approach to Dynamic Inference of Generative Language ModelsBenyamin Jamialahmadi 0001, Parsa Kavehzadeh, Mehdi Rezagholizadeh, Parsa Farinneya, Hossein Rajabzadeh, Aref Jafari, Boxing Chen, Marzieh S. Tahaei. 24853-24867 [doi]
- Social Genome: Grounded Social Reasoning Abilities of Multimodal ModelsLeena Mathur, Marian Qian, Paul Pu Liang, Louis-Philippe Morency. 24868-24891 [doi]
- Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern AnalysisHanxi Guo, Siyuan Cheng 0005, Xiaolong Jin 0002, Zhuo Zhang 0002, Guangyu Shen, Kaiyuan Zhang 0002, Shengwei An, Guanhong Tao 0001, Xiangyu Zhang 0001. 24892-24912 [doi]
- Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMsDingdong Wang, Junan Li, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen M. Meng. 24913-24924 [doi]
- RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement LearningKun Li 0003, Yunxiang Li, Tianhua Zhang, Hongyin Luo, Xixin Wu, James R. Glass, Helen M. Meng. 24925-24943 [doi]
- Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-IndexHao Xu, Jiacheng Liu 0010, Yejin Choi 0001, Noah A. Smith, Hannaneh Hajishirzi. 24944-24969 [doi]
- Mahānāma: A Unique Testbed for Literary Entity Discovery and LinkingSujoy Sarkar, Gourav Sarkar, Manoj Balaji Jagadeeshan, Jivnesh Sandhan, Amrith Krishna, Pawan Goyal 0002. 24970-24984 [doi]
- Adaptively profiling models with task elicitationDavis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong 0001. 24985-25020 [doi]
- Causal Interventions Reveal Shared Structure Across English Filler-Gap ConstructionsSasha Boguraev, Christopher Potts, Kyle Mahowald. 25021-25042 [doi]
- TactfulToM: Do LLMs have the Theory of Mind ability to understand White Lies?Yiwei Liu, Emma Jane Pretty, Jiahao Huang, Saku Sugawara. 25043-25061 [doi]
- Don't Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference CorrelationColten DiIanni, Daniel Deutsch. 25062-25070 [doi]
- SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty PredictionAlexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew S. Lan. 25071-25094 [doi]
- HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin AmericaGuido Ivetta, Marcos J. Gomez, Sofía Martinelli, Pietro Palombini, Maria Emilia Echeveste, Nair Carolina Mazzeo, Beatriz Busaniche, Luciana Benotti. 25095-25117 [doi]
- WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code GenerationRabiul Awal, Mahsa Massoud, Aarash Feizi, Zichao Li, Suyuchen Wang, Christopher Pal, Aishwarya Agrawal, David Vázquez 0001, Siva Reddy, Juan A. Rodríguez, Perouz Taslakian, Spandana Gella, Sai Rajeswar. 25118-25145 [doi]
- Analyzing values about gendered language reform in LLMs' revisionsJules Watson, Xi Wang, Raymond Liu, Suzanne Stevenson, Barend Beekhuizen. 25146-25161 [doi]
- ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration RetrievalZihan Chen, Lei Shi, Weize Wu, Qiji Zhou, Yue Zhang. 25162-25176 [doi]
- HyperKGR: Knowledge Graph Reasoning in Hyperbolic Space with Graph Neural Network Encoding Symbolic PathLihui Liu. 25177-25188 [doi]
- LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge RetrievalYuan Chiang, Elvis Hsieh, Chia-Hong Chou, Janosh Riebesell. 25189-25221 [doi]
- ReSeeding Latent States for Sequential Language UnderstandingStéphane Aroca-Ouellette, Katharina von der Wense, Alessandro Roncone. 25222-25236 [doi]
- DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text EmbeddingsShuya Feng, Yuan Hong. 25237-25245 [doi]
- Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code GenerationMert Inan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani. 25246-25263 [doi]
- Morpheme Induction for Emergent LanguageBrendon Boldt, David R. Mortensen. 25264-25279 [doi]
- Stepwise Informativeness Search for Improving LLM ReasoningSiyuan Wang, Enda Zhao, Xiang Ren 0001. 25280-25298 [doi]
- Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP ArtefactsEric Chamoun, Nedjma Ousidhoum, Michael Sejr Schlichtkrull, Andreas Vlachos 0001. 25299-25335 [doi]
- FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent GuidanceMintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Narayanaswamy, Rashmi Gangadharaiah. 25336-25350 [doi]
- Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3DArtemis Panagopoulou, Le Xue, Honglu Zhou, Silvio Savarese, Ran Xu 0001, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles. 25351-25365 [doi]
- Proactive Hearing Assistants that Isolate Egocentric ConversationsGuilin Hu, Malek Itani, Tuochao Chen, Shyamnath Gollakota. 25366-25383 [doi]
- fLSA: Learning Semantic Structures in Document Collections Using Foundation ModelsWeijia Xu, Nebojsa Jojic, Nicolas Le Roux. 25384-25395 [doi]
- SafeKey: Amplifying Aha-Moment Insights for Safety ReasoningKaiwen Zhou 0002, Xuandong Zhao, Jayanth Srinivasa, Gaowen Liu, Aosong Feng, Dawn Song, Xin Eric Wang. 25396-25412 [doi]
- HypER: Literature-grounded Hypothesis Generation and Distillation with ProvenanceRosni Vasu, Chandrayee Basu, Bhavana Dalvi Mishra, Cristina Sarasua, Peter Clark, Abraham Bernstein. 25413-25438 [doi]
- Empowering GraphRAG with Knowledge Filtering and IntegrationKai Guo 0003, Harry Shomer, Shenglai Zeng, Haoyu Han 0001, Yu Wang 0160, Jiliang Tang. 25439-25453 [doi]
- Interpretable Mnemonic Generation for Kanji Learning via Expectation-MaximizationJaewook Lee, Alexander Scarlatos, Andrew Lan. 25454-25475 [doi]
- Refining Attention for Explainable and Noise-Robust Fact-Checking with TransformersJean-Flavien Bussotti, Paolo Papotti. 25476-25488 [doi]
- Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic EncodingSeongho Joo, Hyukhun Koh, Kyomin Jung. 25489-25524 [doi]
- Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25Meng Lu, Catherine Chen 0001, Carsten Eickhoff. 25525-25547 [doi]
- Rewarding the Unlikely: Lifting GRPO Beyond Distribution SharpeningAndre Wang He, Daniel Fried, Sean Welleck. 25548-25560 [doi]
- PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language PairsSana Kang, Myeongseok Gwon, Su Young Kwon, Jaewook Lee, Andrew Lan, Bhiksha Raj, Rita Singh. 25561-25593 [doi]
- Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM JuriesSahana Ramnath, Anurag Mudgil, Brihi Joshi, Skyler Hallinan, Xiang Ren 0001. 25594-25635 [doi]
- Exploring Chain-of-Thought Reasoning for Steerable Pluralistic AlignmentYunfan Zhang, Kathleen McKeown, Smaranda Muresan. 25636-25649 [doi]
- CMedCalc-Bench: A Fine-Grained Benchmark for Chinese Medical Calculations in LLMYunyan Zhang, Zhihong Zhu, Xian Wu. 25650-25659 [doi]
- Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical StudyGuanyu Hou, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang 0001. 25660-25676 [doi]
- How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human ComparisonJiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang. 25677-25691 [doi]
- Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision MakingYejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chan Young Park. 25692-25733 [doi]
- SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model TransformationAurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He. 25734-25753 [doi]
- Co-Eval: Augmenting LLM-based Evaluation with Machine MetricsLing-I Wu, Weijie Wu, Minyu Chen, Jianxin Xue, Guoqiang Li. 25754-25776 [doi]
- Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video CaptioningMinJu Jeon, Si-Woo Kim, Ye Chan Kim, HyunGee Kim, Dong Jin Kim. 25777-25790 [doi]
- Semantic Networks Extracted from Students' Think-Aloud Data are Correlated with Students' Learning PerformancePingjing Yang, Sullam Jeoung, Jennifer Cromley, Jana Diesner. 25791-25804 [doi]
- Less is More: The Effectiveness of Compact Typological Language RepresentationsYork Hay Ng, Phuong-Hanh Hoang, En-Shiun Annie Lee. 25805-25816 [doi]
- Sparse Activation Editing for Reliable Instruction Following in NarrativesRuncong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Lyu, Shun Shao, Lin Gui 0003, Ruifeng Xu 0001, Yulan He 0001. 25817-25832 [doi]
- Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and LanguagesAsif Shahriar, Rifat Shahriyar, M. Saifur Rahman. 25833-25848 [doi]
- Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text ComprehensionSakiko Yahata, Zhen Wan, Fei Cheng 0002, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai. 25849-25867 [doi]
- OWL: Probing Cross-Lingual Recall of Memorized Texts via World LiteratureAlisha Srivastava, Emir Korukluoglu, Minh Nhat Le, Duyen Tran, Chau Minh Pham, Marzena Karpinska, Mohit Iyyer. 25868-25895 [doi]
- Enhanced Noun-Noun Compound Interpretation through Textual EnrichmentBingyang Ye, Jingxuan Tu, James Pustejovsky. 25896-25911 [doi]
- ICL CIPHERS: Quantifying "Learning" in In-Context Learning via Substitution CiphersZhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu, Daniel Khashabi. 25912-25933 [doi]
- Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction TuningYunhao Gou, Hansi Yang, Zhili Liu, Kai Chen 0023, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu 0001, Bo Han 0003, James Kwok, Yu Zhang 0006. 25934-25960 [doi]
- Memory OS of AI AgentJiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai. 25961-25970 [doi]
- Rule Discovery for Natural Language Inference Data Generation Using Out-of-Distribution DetectionJuyoung Han, Hyunsun Hwang, Changki Lee. 25971-25991 [doi]
- Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language ModelsZesen Lyu, Dandan Zhang, Wei Ye, Fangdi Li, Zhihang Jiang, Yao Yang. 25992-26003 [doi]
- Definition Generation for Word Meaning Modeling: Monolingual, Multilingual, and Cross-Lingual PerspectivesFrancesco Periti, Roksana Goworek, Haim Dubossarsky, Nina Tahmasebi. 26004-26024 [doi]
- Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual TransformersJuncheng Wang, Chao Xu 0023, Cheng Yu, Zhe Hu, Haoyu Xie 0002, Guoqi Yu, Lei Shang, Shujun Wang. 26025-26043 [doi]
- HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order OptimizationHuaqin Zhao, Jiaxi Li, Yi Pan 0001, Shizhe Liang, Xiaofeng Yang, Fei Dou, Tianming Liu 0001, Jin Lu 0001. 26044-26067 [doi]
- Zero-shot Multimodal Document Retrieval via Cross-modal Question GenerationYejin Choi 0004, Jae-Woo Park 0003, Janghan Yoon, Saejin Kim, Jaehyun Jeon 0002, Youngjae Yu. 26068-26083 [doi]
- From Parameters to Performance: A Data-Driven Study on LLM Structure and DevelopmentSuqing Wang, Zuchao Li, Luohe Shi, Bo Du 0001, Hai Zhao 0001, Yun Li 0011, Qianren Wang. 26084-26101 [doi]
- Logical Reasoning with Outcome Reward Models for Test-Time ScalingRamya Keerthy Thatikonda, Wray L. Buntine, Ehsan Shareghi. 26102-26112 [doi]
- Speculating LLMs' Chinese Training Data Pollution from Their TokensQingjie Zhang, Di Wang, Haoting Qian, Liu Yan, Tianwei Zhang 0004, Ke Xu 0002, Qi Li 0002, Minlie Huang, Hewu Li, Han Qiu 0001. 26113-26133 [doi]
- NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative ContextsAbhay Gupta, Kevin Zhu, Vasu Sharma, Sean O'Brien, Michael Lu. 26134-26151 [doi]
- Weights-Rotated Preference Optimization for Large Language ModelsChenxu Yang, Ruipeng Jia, Mingyu Zheng, Naibin Gu, Zheng Lin 0001, Siyuan Chen, Weichong Yin, Hua Wu 0003, Weiping Wang 0005. 26152-26175 [doi]
- The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM AgentsYuhan Liu 0030, Zirui Song, Juntian Zhang, Xiaoqing Zhang, Xiuying Chen, Rui Yan. 26176-26192 [doi]
- How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language ModelsKangtao Lv, Haibin Chen, Yujin Yuan, Langming Liu, Shilei Liu, Yongwei Wang, Wenbo Su, Bo Zheng 0007. 26193-26208 [doi]
- SMEC:Rethinking Matryoshka Representation Learning for Retrieval Embedding CompressionBiao Zhang, Lixin Chen, Tong Liu, Bo Zheng. 26209-26222 [doi]
- Reverse Prompt Engineering: A Zero-Shot, Genetic Algorithm Approach to Language Model InversionHanqing Li, Diego Klabjan. 26223-26245 [doi]
- DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual ReasoningHang Wu, hongkai Chen, Yujun Cai, Chang Liu 0072, Qingwen Ye, Ming-Hsuan Yang 0001, Yiwei Wang 0001. 26246-26256 [doi]
- SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language ModelsJia Wang, Ziyu Zhao, Tingjuntao Ni, Zhongyu Wei. 26257-26289 [doi]
- Financial Risk Relation Identification through Dual-view AdaptationWei-Ning Chiu, Yu-Hsiang Wang, Andy Hsiao, Yu-Shiang Huang, Chuan-Ju Wang. 26290-26300 [doi]
- CopySpec: Accelerating LLMs with Speculative Copy-and-PasteRazvan Gabriel Dumitru, Minglai Yang 0002, Vikas Yadav, Mihai Surdeanu. 26301-26332 [doi]
- GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model CompressionKainan Liu, Yong Zhang 0058, Ning Cheng 0001, Zhitao Li 0002, Shaojun Wang, Jing Xiao 0006. 26333-26348 [doi]
- GraphAgent: Agentic Graph Language AssistantYuhao Yang 0002, Jiabin Tang, Lianghao Xia, Xingchen Zou, Yuxuan Liang 0002, Chao Huang 0001. 26349-26368 [doi]
- DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent CollaborationZhihao Jia, Mingyi Jia, Junwen Duan, Jian-Xin Wang. 26369-26386 [doi]
- FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User DataWenhao Wang, Zijie Yu, Rui Ye 0001, Jianqing Zhang, Guangyi Liu, Liang Liu, Siheng Chen, Yanfeng Wang 0001. 26387-26408 [doi]
- VLA-Mark: A cross modal watermark for large vision-language alignment modelsShuliang Liu, Zheng Qi, Jesse Jiaxi Xu, Yibo Yan, Junyan Zhang, He Geng, Aiwei Liu, Peijie Jiang, Jia Liu, Yik-Cheung Tam, Xuming Hu. 26409-26427 [doi]
- Sentence Smith: Controllable Edits for Evaluating Text EmbeddingsHongji Li, Andrianos Michail, Reto Gubelmann, Simon Clematide, Juri Opitz. 26428-26445 [doi]
- ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningYu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang 0098, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang 0001, Tingyang Xu, Qifeng Bai, Yu Rong 0001. 26446-26467 [doi]
- Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense RetrievalSeongwan Park, Taeklim Kim, Youngjoong Ko. 26468-26485 [doi]
- UICOMPASS: UI Map Guided Mobile Task Automation via Adaptive Action GenerationYuanzhang Lin, Zhe Zhang, He Rui, Qingao Dong, Mingyi Zhou, Jing Zhang, Xiang Gao 0012, Hailong Sun 0001. 26486-26506 [doi]
- Leaky Thoughts: Large Reasoning Models Are Not Private ThinkersTommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh. 26507-26529 [doi]
- Model Unlearning via Sparse Autoencoder Subspace Guided ProjectionsXu Wang 0033, Zihao Li, Benyou Wang, Yan Hu, Difan Zou. 26530-26546 [doi]
- ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement LearningChangtai Zhu, Siyin Wang, Ruijun Feng, Kai Song, Xipeng Qiu. 26547-26564 [doi]
- How to Make Large Language Models Generate 100% Valid Molecules?Wen Tao, Jing Tang 0004, Alvin Chan, Bryan Hooi, Baolong Bi, Nanyun Peng 0001, Yuansheng Liu, Yiwei Wang 0001. 26565-26580 [doi]
- Exploring Quality and Diversity in Synthetic Data Generation for Argument MiningJianzhu Bao, Yuqi Huang, Yang Sun, Wenya Wang, Yice Zhang, Bojun Jin, Ruifeng Xu 0001. 26581-26604 [doi]
- Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-TuningMohammad Amin Ghanizadeh, Mohammad Javad Dousti. 26605-26613 [doi]
- 3MDBench: Medical Multimodal Multi-agent Dialogue BenchmarkIvan Sviridov, Amina Miftakhova, Artemiy Tereshchenko, Galina Zubkova, Pavel Blinov, Andrey V. Savchenko. 26614-26654 [doi]
- OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and AttributionLucio La Cava, Andrea Tagarelli. 26655-26671 [doi]
- CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error ScenariosShiting Huang, Zhen Fang, Zehui Chen, Siyu Yuan, Junjie Ye 0005, Yu Zeng, Lin Chen 0019, Qi Mao, Feng Zhao 0004. 26672-26704 [doi]
- Pre-trained Language Models Learn Remarkably Accurate Representations of NumbersMarek Kadlcík, Michal Stefánik, Timothee Mickus, Josef Kuchar, Michal Spiegel. 26705-26714 [doi]
- Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption GenerationYu Zeng, Yukun Qi, Yiming Zhao, Xikun Bao, Lin Chen 0026, Zehui Chen, Shiting Huang, Jie Zhao, Feng Zhao 0004. 26715-26741 [doi]
- Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware DeferralAntónio Farinhas, Nuno Miguel Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins. 26742-26756 [doi]
- iVISPAR - An Interactive Visual-Spatial Reasoning Benchmark for VLMsJulius Mayer 0001, Mohamad Ballout, Serwan Jassim, Farbod Nosrat Nezami, Elia Bruni. 26757-26781 [doi]
- Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model PerformanceOmer Nahum, Nitay Calderon, Orgad Keller, Idan Szpektor, Roi Reichart. 26782-26809 [doi]
- Detecting Legal Citations in United Kingdom Court JudgmentsHolli Sargeant, Andreas Östling, Måns Magnusson. 26810-26836 [doi]
- Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun ReplacementsGuangxiang Zhao, Saier Hu, Xiaoqi Jian, Jinzhu Wu, Yuhan Wu 0001, Lin Sun 0010, Xiangzheng Zhang. 26837-26846 [doi]
- Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training EfficiencyEhsan Doostmohammadi, Marco Kuhlmann. 26847-26856 [doi]
- Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task PerformancePedro Henrique Luz de Araujo, Paul Röttger, Dirk Hovy, Benjamin Roth 0001. 26857-26886 [doi]
- HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter MergingTaha Ceritli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli. 26887-26909 [doi]
- Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for ReasoningSenjie Jin, Lu Chen 0001, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou 0005, Xinbo Zhang, Peng Sun 0006, Hong Lu, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 26910-26927 [doi]
- Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed AcceptanceSongsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu 0005, Feng Gao, Yu Wang 0002, Derek F. Wong. 26928-26940 [doi]
- Leveraging Text-to-Text Transformers as Classifier Chain for Few-Shot Multi-Label ClassificationQuang Anh Nguyen, Nadi Tomeh, Mustapha Lebbah, Thierry Charnois, Hanane Azzag. 26941-26950 [doi]
- M-Wanda: Improving One-Shot Pruning for Multilingual LLMsRochelle Choenni, Ivan Titov 0001. 26951-26964 [doi]
- Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing LanguageHamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang, Lasana T. Harris, Nafise Sadat Moosavi. 26965-26980 [doi]
- Conflict-Aware Soft Prompting for Retrieval-Augmented GenerationEunseong Choi, June Park, Hyeri Lee, Jongwuk Lee. 26981-26995 [doi]
- R-CHAR: A Metacognition-Driven Framework for Role-Playing in Large Language ModelsHaiming Qin, Jiwei Zhang 0020, Wei Zhang 0242, Kezhong Lu, Mingyang Zhou 0001, Hao Liao, Rui Mao 0001. 26996-27014 [doi]
- Annotating Training Data for Conditional Semantic Textual Similarity Measurement using Large Language ModelsGaifan Zhang, Yi Zhou 0019, Danushka Bollegala. 27015-27027 [doi]
- When Words Smile: Generating Diverse Emotional Facial Expressions from TextHaidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang 0005, Hao Fei 0001. 27028-27046 [doi]
- Improving Online Job Advertisement Analysis via Compositional Entity ExtractionKai Krüger, Johanna Binnewitt, Kathrin Ehmann, Stefan Winnige, Alan Akbik. 27047-27065 [doi]
- Correlation-Aware Example Selection for In-Context Learning with Nonsymmetric Determinantal Point ProcessesQiunan Du, Zhiliang Tian, Zhen Huang 0006, Kailun Bian, Tianlun Liu, Zhaoning Zhang 0001, Xinwang Liu 0002, Feng Liu, Dong Sheng Li 0001. 27066-27082 [doi]
- Leveraging Cognitive Complexity of Texts for Contextualization in Dense RetrievalEffrosyni Sokli, Georgios Peikos, Pranav Kasela, Gabriella Pasi. 27083-27096 [doi]
- Beyond Online Sampling: Bridging Offline-to-Online Alignment via Dynamic Data Transformation for LLMsZhang Zhang, Guhao Feng, Jian Guan, Di He 0001, Wei Wu. 27097-27109 [doi]
- CAVE : Detecting and Explaining Commonsense Anomalies in Visual EnvironmentsRishika Bhagwatkar, Syrielle Montariol, Angelika Romanou, Beatriz Borges, Irina Rish, Antoine Bosselut. 27110-27151 [doi]
- Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-trainingLinjuan Wu, Haoran Wei, Huan Lin, Tianhao Li, Baosong Yang, Fei Huang 0002, Weiming Lu 0001. 27152-27166 [doi]
- SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global ThinkingSifan Li, Yujun Cai, Yiwei Wang 0001. 27167-27177 [doi]
- Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric AugmentationQianxi He, Qianyu He, Jiaqing Liang, Weikang Zhou, Zeye Sun, Fei Yu, Yanghua Xiao. 27178-27192 [doi]
- Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language ModelsAlessandro De Bellis, Salvatore Bufi, Giovanni Servedio, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio. 27193-27209 [doi]
- Extracting Linguistic Information from Large Language Models: Syntactic Relations and Derivational KnowledgeTsedeniya Kinfe Temesgen, Marion Di Marco, Alexander Fraser. 27210-27226 [doi]
- Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model ReasoningQianxi He, QingYu Ren, Shanzhe Lei, Xuhong Wang, Yingchun Wang. 27227-27243 [doi]
- TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking AgentDominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp. 27244-27261 [doi]
- Frequency & Compositionality in Emergent CommunicationJean-Baptiste Sevestre, Emmanuel Dupoux. 27262-27274 [doi]
- Summarizing Speech: A Comprehensive SurveyFabian Retkowski, Maike Züfle, Andreas Sudmann, Dinah Pfau, Shinji Watanabe 0001, Jan Niehues, Alexander Waibel. 27275-27306 [doi]
- CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based RewardsCheng Liu, Yifei Lu, Fanghua Ye 0004, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li. 27307-27336 [doi]
- Assay2Mol: Large Language Model-based Drug Design Using BioAssay ContextYifan Deng, Spencer S. Ericksen, Anthony Gitter. 27337-27362 [doi]
- Frame First, Then Extract: A Frame-Semantic Reasoning Pipeline for Zero-Shot Relation Triplet ExtractionZehan Li, Fu Zhang, Wenqing Zhang, Jiawei Li, Zhou Li, Jingwei Cheng, Tianyue Peng. 27363-27376 [doi]
- MrGuard: A Multilingual Reasoning Guardrail for Universal LLM SafetyYahan Yang, Soham Dan, Shuo Li, Dan Roth 0001, Insup Lee 0001. 27377-27396 [doi]
- TALON: A Multi-Agent Framework for Long-Table Exploration and Question AnsweringRuochun Jin, Xiyue Wang, Dong Wang, Haoqi Zheng, Yunpeng Qi, Silin Yang, Meng Zhang. 27397-27413 [doi]
- You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation ModelsPawel Maka, Yusuf Can Semerci, Jan Scholtes, Gerasimos Spanakis. 27414-27437 [doi]
- Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RLJessica Hoffmann, Christiane Ahlheim, Zac Yu, Aria Walfrand, Jarvis Jin, Marie Tano, Ahmad Beirami, Erin MacMurray van Liemt, Nithum Thain, Hakim Sidahmed, Lucas Dixon. 27438-27467 [doi]
- Randomized Smoothing Meets Vision-Language ModelsEmmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng. 27468-27478 [doi]
- PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring DialoguesMatthew Zent, Digory Smith, Simon Woodhead 0002. 27479-27488 [doi]
- Trustworthy Medical Question Answering: An Evaluation-Centric SurveyYinuo Wang, Baiyang Wang, Robert E. Mercer, Frank Rudzicz, Sudipta Singha Roy, Pengjie Ren, Zhumin Chen, Xindi Wang 0001. 27489-27502 [doi]
- Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not MeaningWesley Scivetti, Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider 0001. 27503-27514 [doi]
- BOUQuET : dataset, Benchmark and Open initiative for Universal Quality Evaluation in TranslationPierre Andrews, Mikel Artetxe, Mariano Coria Meglioli, Marta R. Costa-Jussà, Joe Chuang, David Dale, Mark Duppenthaler, Nathanial Paul Ekberg, Cynthia Gao, Daniel Edward Licht, Jean Maillard, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Eduardo Sánchez, Ioannis Tsiamas, Arina Turkatenko, Albert Ventayol-Boada, Shireen Yates. 27515-27535 [doi]
- HealthCards: Exploring Text-to-Image Generation as Visual Aids for Healthcare Knowledge Democratizing and EducationQian Wu, Zheyao Gao, Longfei Gou, Yifan Hou, Ann Sin Nga Lau, Qi Dou 0001. 27536-27558 [doi]
- When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMsAmmar Khairi, Daniel D'Souza, Ye Shen, Julia Kreutzer, Sara Hooker. 27559-27583 [doi]
- Creativity in LLM-based Multi-Agent Systems: A SurveyYi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen 0005, Hung-yi Lee, Yun-Nung Chen. 27584-27607 [doi]
- Context and POS in Action: A Comparative Study of Chinese Homonym Disambiguation in Human and Language ModelsChenwei Xie, Matthew King-Hang Ma, Wenbo Wang, William Shi-Yuan Wang. 27608-27625 [doi]
- Attacking Misinformation Detection Using Adversarial Examples Generated by Language ModelsPiotr Przybyla, Euan McGill, Horacio Saggion. 27626-27642 [doi]
- Leveraging Loanword Constraints for Improving Machine Translation in a Low-Resource Multilingual ContextFelermino D. M. A. Ali, Henrique Lopes Cardoso, Rui Sousa-Silva. 27643-27657 [doi]
- Linguistic Neuron Overlap Patterns to Facilitate Cross-lingual Transfer on Low-resource LanguagesYuemei Xu, Kexin Xu, Jian Zhou, Ling Hu, Lin Gui. 27658-27673 [doi]
- Scaling Low-Resource MT via Synthetic Data Generation with LLMsOna de Gibert, Joseph Attieh, Teemu Vahtola, Mikko Aulamo, Zihao Li, Raúl Vázquez, Tiancheng Hu, Jörg Tiedemann. 27674-27692 [doi]
- Tailoring Table Retrieval from a Field-aware Hybrid Matching PerspectiveDa Li 0003, Keping Bi, Jiafeng Guo, Xueqi Cheng. 27693-27704 [doi]
- Randomly Removing 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification TasksSotaro Takeshita, Yurina Takeshita, Daniel Ruffinelli, Simone Paolo Ponzetto. 27705-27726 [doi]
- Morables: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with FablesMatteo Marcuzzo, Alessandro Zangari, Andrea Albarelli, José Camacho-Collados, Mohammad Taher Pilehvar. 27727-27751 [doi]
- MessIRve: A Large-Scale Spanish Information Retrieval DatasetFrancisco Valentini, Viviana Cotik, Damián Ariel Furman, Ivan Bercovich, Edgar Altszyler, Juan Manuel Pérez. 27752-27769 [doi]
- AFRIDOC-MT: Document-level MT Corpus for African LanguagesJesujoba Oluwadara Alabi, Israel Abebe Azime, Miaoran Zhang, Cristina España-Bonet, Rachel Bawden, Dawei Zhu, David Ifeoluwa Adelani, Clement Odoje, Idris Akinade, Iffat Maab, Davis David, Shamsuddeen Hassan Muhammad, Neo Putini, David O. Ademuyiwa, Andrew Caines, Dietrich Klakow. 27770-27806 [doi]
- Charting the Landscape of African NLP: Mapping Progress and Shaping the Road AheadJesujoba Oluwadara Alabi, Michael A. Hedderich, David Ifeoluwa Adelani, Dietrich Klakow. 27807-27841 [doi]
- GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?Yiyang Zhou, Linjie Li, Shi Qiu 0016, Zhengyuan Yang, Yuyang Zhao, Siwei Han, Yangfan He, Kangqi Li, Haonian Ji, Zihao Zhao, Haibo Tong, Lijuan Wang, Huaxiu Yao. 27842-27856 [doi]
- Social Bias in Multilingual Language Models: A SurveyLance Calvin Lim Gamboa, Yue Feng, Mark G. Lee. 27857-27880 [doi]
- BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question AnsweringCostas Mavromatis, Soji Adeshina, Vassilis N. Ioannidis, Zhen Han, Qi Zhu 0008, Ian Robinson, Bryan Thompson 0001, Huzefa Rangwala, George Karypis. 27881-27898 [doi]
- Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical TextAvijit Mitra, Zhichao Yang 0001, Emily Druhl, Raelene Goodwin, Hong Yu 0001. 27899-27935 [doi]
- Pun Unintended: LLMs and the Illusion of Humor UnderstandingAlessandro Zangari, Matteo Marcuzzo, Andrea Albarelli, Mohammad Taher Pilehvar, José Camacho-Collados. 27936-27971 [doi]
- RACCooN: Versatile Instructional Video Editing with Auto-Generated NarrativesJaehong Yoon, Shoubin Yu, Mohit Bansal. 27972-28008 [doi]
- Pre-trained Models Perform the Best When Token Distributions Follow Zipf's LawYanjin He, Qingkai Zeng, Meng Jiang. 28009-28021 [doi]
- Do RAG Systems Really Suffer From Positional Bias?Florin Cuconasu, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri. 28022-28036 [doi]
- Aspect-Oriented Summarization for Psychiatric Short-Term Readmission PredictionWonjin Yoon, Boyu Ren, Spencer Thomas, Chanhwi Kim, Guergana Savova, Mei-Hua Hall, Tim Miller 0002. 28037-28054 [doi]
- Adapting Bias Evaluation to Domain Contexts using Generative ModelsTamara Quiroga, Felipe Bravo-Marquez, Valentin Barriere. 28055-28066 [doi]
- Emergent morpho-phonological representations in self-supervised speech modelsJon Gauthier, Canaan Breiss, Matthew K. Leonard, Edward F. Chang. 28067-28086 [doi]
- Multilingual Language Model Pretraining using Machine-translated DataJiayi Wang, Yao Lu, Maurice Weber, Max Ryabinin, David Ifeoluwa Adelani, Yihong Chen, Raphael Tang, Pontus Stenetorp. 28087-28107 [doi]
- IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention UnderstandingJinggui Liang, Dung Vo 0002, Lizi Liao. 28108-28125 [doi]
- Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video ReasoningZiyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal. 28126-28140 [doi]
- Efficient Compositional Multi-tasking for On-device Large Language ModelsOndrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli. 28141-28165 [doi]
- Improving Large Language Model Safety with Contrastive Representation LearningSamuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin 0001. 28166-28194 [doi]
- Leveraging What's Overfixed: Post-Correction via LLM Grammatical Error OvercorrectionTaehee Park, Heejin Do, Gary Lee 0001. 28195-28207 [doi]
- Scaling Up Temporal Domain Generalization via Temporal Experts AveragingAoming Liu, Kevin Miller, Venkatesh Saligrama, Kate Saenko, Boqing Gong, Ser-Nam Lim, Bryan A. Plummer. 28208-28231 [doi]
- LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-EncoderYi Jing, Zijun Yao 0002, Hongzhu Guo, Lingxu Ran, Xiaozhi Wang, Lei Hou 0001, Juanzi Li. 28232-28251 [doi]
- The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language ModelsAdrian Cosma, Stefan Ruseti, Emilian Radoi, Mihai Dascalu. 28252-28263 [doi]
- Improving the Quality of Web-mined Parallel Corpora of Low-Resource Languages using Debiasing HeuristicsAloka Fernando, Nisansa de Silva, Menan Velayuthan, Charitha Rathnayake, Surangika Ranathunga. 28264-28281 [doi]
- Weaver: Interweaving SQL and LLM for Table ReasoningRohit Khoja, Devanshu Gupta, Yanjie Fu, Dan Roth 0001, Vivek Gupta 0001. 28282-28308 [doi]
- ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue GenerationSeungmin Shin, Dooyoung Kim, Youngjoong Ko. 28309-28321 [doi]
- Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzlesAntara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis. 28322-28332 [doi]
- Unsupervised Concept Vector Extraction for Bias Control in LLMsHannah Cyberey, Yangfeng Ji, David Evans 0001. 28333-28355 [doi]
- Seeing the Same Story Differently: Framing-Divergent Event Coreference for Computational Framing AnalysisJin Zhao, Xinrui Hu, Nianwen Xue. 28356-28371 [doi]
- LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity RecognitionFan Bai 0006, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze. 28372-28392 [doi]
- COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down ProjectionJaewon Cheon, Pilsung Kang 0001. 28393-28409 [doi]
- SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative RefinementChelsi Jain, Yiran Wu, Yifan Zeng, Jiale Liu, Shengyu Dai, Zhenwen Shao, Qingyun Wu, Huazheng Wang. 28410-28427 [doi]
- VLP: Vision-Language Preference Learning for Embodied ManipulationRunze Liu 0002, Chenjia Bai, Jiafei Lyu, Shengjie Sun 0002, Yali Du 0001, Xiu Li 0001. 28428-28444 [doi]
- QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal ModelsKuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong, Ruochen Wang, Cho-Jui Hsieh. 28445-28460 [doi]
- EGOILLUSION: Benchmarking Hallucinations in Egocentric Video UnderstandingAshish Seth, Utkarsh Tyagi, Ramaneswaran Selvakumar, Nishit Anand, Sonal Kumar, Sreyan Ghosh, Ramani Duraiswami, Chirag Agarwal, Dinesh Manocha. 28461-28480 [doi]
- MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal InteractionsRamaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha. 28481-28493 [doi]
- Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall MechanismsMinyeong Choe, Haehyun Cho, Changho Seo, Hyunil Kim. 28494-28513 [doi]
- Probing Narrative Morals: A New Character-Focused MFT Framework for Use with Large Language ModelsLuca Mitran, Sophie Wu, Andrew Piper. 28514-28529 [doi]
- Probing and Boosting Large Language Models Capabilities via Attention HeadsDezhi Zhao, Xin Liu, Xiaocheng Feng, Hui Wang, Bing Qin. 28530-28544 [doi]
- A Survey of Link Prediction in N-ary Knowledge GraphsJiyao Wei, Saiping Guan, Da Li 0003, Zhongni Hou, Miao Su, Yucan Guo, Xiaolong Jin 0001, Jiafeng Guo, Xueqi Cheng. 28545-28567 [doi]
- Multi-Frequency Contrastive Decoding: Alleviating Hallucinations for Large Vision-Language ModelsBingqian Liu, Fu Zhang, Guoqing Chen, Jingwei Cheng. 28568-28584 [doi]
- ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model CapabilitiesYifan Duan, Yihong Tang, Kehai Chen, Liqiang Nie, Min Zhang 0005. 28585-28600 [doi]
- BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain TasksTianyuan Huang, Zepeng Zhu, Hangdi Xing, Zirui Shao, Zhi Yu, Chaoxiong Yang, Jiaxian He, Xiaozhong Liu, Jiajun Bu. 28601-28612 [doi]
- MAviS: A Multimodal Conversational Assistant For Avian SpeciesYevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan 0001, Hisham Cholakkal. 28613-28639 [doi]
- Refining Text Generation for Realistic Conversational Recommendation via Direct Preference OptimizationManato Tajiri, Michimasa Inaba. 28640-28661 [doi]
- Large Language Models Threaten Language's Epistemic and Communicative FoundationsShashank Srivastava. 28662-28676 [doi]
- Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based InferenceZhuo Chen, Xinyu Wang 0013, Yong Jiang 0005, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang 0002, Kewei Tu. 28677-28692 [doi]
- Multi-view-guided Passage Reranking with Large Language ModelsJeongwoo Na, Jun Kwon, Eunseong Choi, Jongwuk Lee. 28693-28706 [doi]
- Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using GazeÖzge Alaçam, Sanne Hoeken, Andreas Säuberli, Hannes Gröner, Diego Frassinelli, Sina Zarrieß, Barbara Plank. 28707-28724 [doi]
- VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language ModelJunhyuk Choi, Ro-hoon Oh, Jihwan Seol, Bugeun Kim. 28725-28736 [doi]
- Explaining Differences Between Model Pairs in Natural Language through Sample LearningAdvaith Malladi, Rakesh R. Menon, Yuvraj Jain, Shashank Srivastava. 28737-28759 [doi]
- Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future DirectionsYu-Ang Lee, Guan-Ting Yi, Mei-yi Liu, Jui-Chao Lu, Guan-Bo Yang, Yun-Nung Chen. 28760-28775 [doi]
- A Multi-Level Benchmark for Causal Language Understanding in Social Media DiscourseXiaohan Ding, Kaike Ping, Buse Çarik, Eugenia Ha Rim Rho. 28776-28790 [doi]
- Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality MissingnessZiHan Liang, Ziwen Pan, Ruoxuan Xiong. 28791-28808 [doi]
- XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question AnsweringKeon-Woo Roh, Yeong-Joon Ju, Seong-Whan Lee. 28809-28821 [doi]
- Transformer-Based Temporal Information Extraction and Application: A ReviewXin Su 0008, Phillip Howard, Steven Bethard. 28822-28841 [doi]
- How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit MisinformationRuohao Guo, Wei Xu 0004, Alan Ritter. 28842-28861 [doi]
- AmpleHate: Amplifying the Attention for Versatile Implicit Hate DetectionYejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han. 28862-28874 [doi]
- Can Large Language Models Act as Ensembler for Multi-GNNs?Hanqi Duan, Yao Cheng, Jianxiang Yu, Yao Liu, Xiang Li. 28875-28894 [doi]
- Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language ModelsYounwoo Choi, Changling Li, Yongjin Yang, Zhijing Jin 0001. 28895-28928 [doi]
- From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-TextRidwan Mahbub, Mohammed Saidul Islam, Mir Tafseer Nayeem, Md. Tahmid Rahman Laskar, Mizanur Rahman, Shafiq Joty, Enamul Hoque. 28929-28947 [doi]
- Real-time Ad Retrieval via LLM-generative Commercial Intention for Sponsored Search AdvertisingTongtong Liu, Zhaohui Wang, Meiyue Qin, Zenghui Lu, Xudong Chen, Yuekui Yang, Peng Shu. 28948-28960 [doi]
- Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language ModelsIkhyun Cho, Julia Hockenmaier. 28961-28973 [doi]
- CLMTracing: Black-box User-level Watermarking for Code Language Model TracingBoyu Zhang, Ping He, Tianyu Du, Xuhong Zhang 0002, Lei Yun, Kingsum Chow, Jianwei Yin. 28974-28990 [doi]
- The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for AuthorsAbdelrahman Sadallah, Tim Baumgärtner, Iryna Gurevych, Ted Briscoe. 28991-29021 [doi]
- Evolving Chinese Spelling Correction with Corrector-Verifier CollaborationLinfeng Liu 0003, Hongqiu Wu, Hai Zhao 0001. 29022-29028 [doi]
- M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language ModelYang Zhou, Pengfei Cao, Yubo Chen 0001, Qingbin Liu, Dianbo Sui, Xi Chen 0003, Kang Liu 0001, Jun Zhao 0001. 29029-29042 [doi]
- Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual QuestionsHaochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, Zhenzhou Ji. 29043-29056 [doi]
- Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two ApproachesAlan Ramponi, Marco Rovera, Róbert Móro, Sara Tonelli. 29057-29076 [doi]
- How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM HallucinationSaad Obaid ul Islam, Anne Lauscher, Goran Glavas. 29077-29098 [doi]
- LiTransProQA: An LLM-based Literary Translation Evaluation Metric with Professional Question AnsweringRan Zhang, Wei Zhao, Lieve Macken, Steffen Eger. 29099-29121 [doi]
- Improving Handshape Representations for Sign Language Processing: A Graph Neural Network ApproachAlessa Carbo, Eric T. Nalisnick. 29122-29135 [doi]
- Instructing Large Language Models for Low-Resource Languages: A Systematic Study for BasqueOscar Sainz, Naiara Pérez, Julen Etxaniz, Joseba Fernandez de Landa, Itziar Aldabe, Iker García-Ferrero, Aimar Zabala, Ekhi Azurmendi, German Rigau, Eneko Agirre, Mikel Artetxe, Aitor Soroa. 29136-29160 [doi]
- SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding TasksRitam Dutt, Carolyn P. Rosé, Maarten Sap. 29161-29197 [doi]
- Beyond A Single AI Cluster: A Survey of Decentralized LLM TrainingHaotian Dong, Jingyan Jiang, Rongwei Lu, Jiajun Luo, Jiajun Song, Bowen Li, Ying Shen, Zhi Wang 0001. 29198-29212 [doi]
- Can LLM Agents Maintain a Persona in Discourse?Pranav Bhandari, Nicolas Fay, Michael J. Wise, Amitava Datta, Stephanie Meek, Usman Naseem, Mehwish Nasim. 29213-29229 [doi]
- Iterative Multilingual Spectral Attribute ErasureShun Shao, Yftah Ziser, Zheng Zhao 0005, Yifu Qiu, Shay B. Cohen, Anna Korhonen. 29230-29255 [doi]
- TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability ResearchAbir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah. 29256-29284 [doi]
- SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool CallingFares Fawzi, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Käser. 29285-29310 [doi]
- Logit Space Constrained Fine-Tuning for Mitigating Hallucinations in LLM-Based Recommender SystemsJianfeng Deng, Qingfeng Chen, Debo Cheng, Jiuyong Li, Lin Liu 0003. 29311-29324 [doi]
- PACHAT: Persona-Aware Speech Assistant for Multi-party DialogueDongjie Fu, Xize Cheng, Linjun Li, Xiaoda Yang, Lujia Yang, Tao Jin 0004. 29325-29342 [doi]
- Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from JailbreakingJunda Zhu 0003, Lingyong Yan, Shuaiqiang Wang, Dawei Yin 0001, Lei Sha. 29343-29361 [doi]
- Graph-Guided Textual Explanation Generation FrameworkShuzhou Yuan, Jingyi Sun, Ran Zhang, Michael Färber 0001, Steffen Eger, Pepa Atanasova, Isabelle Augenstein. 29362-29386 [doi]
- The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate ItLeonardo Bertolazzi, Philipp Mondorf, Barbara Plank, Raffaella Bernardi. 29387-29424 [doi]
- A Causal Lens for Evaluating Faithfulness MetricsKerem Zaman, Shashank Srivastava. 29425-29449 [doi]
- Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long ContextsYifei Yu, Qian-Wen Zhang, Lingfeng Qiao, Di Yin, Fang Li, Jie Wang, Chen Zeng Xi, Suncong Zheng, Xiaolong Liang, Xing Sun 0001. 29450-29468 [doi]
- FISTAPruner: Layer-wise Post-training Pruning for Large Language ModelsPengxiang Zhao, Hanyu Hu, Ping Li 0057, Yi Zheng, Zhefeng Wang, Xiaoming Yuan 0001. 29469-29487 [doi]
- Do LLMs Encode Frame Semantics? Evidence from Frame IdentificationJayanth Krishna Chundru, Rudrashis Poddar, Jie Cao 0010, Tianyu Jiang. 29488-29500 [doi]
- StepER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language ModelsKyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu. 29501-29523 [doi]
- How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level AnalysisYushi Yang, Filip Sondej, Harry Mayne, Andrew Lee, Adam Mahdi. 29524-29543 [doi]
- It's All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMsYue Li, Zhixue Zhao, Carolina Scarton. 29544-29559 [doi]
- Where to show Demos in Your Prompt: A Positional Bias of In-Context LearningKwesi A. Cobbina, Tianyi Zhou. 29560-29593 [doi]
- Multilingual Pretraining for Pixel Language ModelsIlker Kesen, Jonas F. Lotz, Ingo Ziegler, Phillip Rust, Desmond Elliott. 29594-29611 [doi]
- MetaFaith: Faithful Natural Language Uncertainty Expression in LLMsGabrielle Kaili-May Liu, Gal Yona, Avi Caciularu, Idan Szpektor, Tim G. J. Rudner, Arman Cohan. 29612-29656 [doi]
- Machine-generated text detection prevents language model collapseGeorge Drayson, Emine Yilmaz, Vasileios Lampos. 29657-29673 [doi]
- Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled DataFaeze Ghorbanpour, Daryna Dementieva, Alexander Fraser 0001. 29674-29692 [doi]
- V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like ChatQi Lin, Weikai Xu, Lisi Chen 0001, Bin Dai. 29693-29706 [doi]
- Mixture of Languages: Improved Multilingual Encoders Through Language GroupingJoão Maria Janeiro, Belen Alastruey, Francisco Massa, Maha Elbayad, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault. 29707-29722 [doi]
- Too Helpful, Too Harmless, Too Honest or Just Right?Gautam Siddharth Kashyap, Mark Dras, Usman Naseem. 29723-29734 [doi]
- Cardiverse: Harnessing LLMs for Novel Card Game PrototypingDanrui Li, Sen Zhang, Samuel S. Sohn, Kaidong Hu, Muhammad Usman 0010, Mubbasir Kapadia. 29735-29762 [doi]
- Assessing effective de-escalation of crisis conversations using transformer-based models and trend statisticsIgnacio J. Tripodi, Greg Buda, Margaret Meagher, Elizabeth A. Olson. 29763-29777 [doi]
- Measuring and Mitigating Media Outlet Name Bias in Large Language ModelsSeong-Jin Park, Kang Min Kim. 29778-29797 [doi]
- The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context LearningStephanie Schoch, Yangfeng Ji. 29798-29812 [doi]
- Where Confabulation Lives: Latent Feature Discovery in LLMsThibaud Ardoin, Yi Cai 0005, Gerhard Wunder. 29813-29837 [doi]
- Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao, Nikolaos Aletras. 29838-29853 [doi]
- Playpen: An Environment for Exploring Learning From Dialogue Game FeedbackNicola Horst, Davide Mazzaccara, Antonia Schmidt, Michael Sullivan, Filippo Momentè, Luca Franceschetti, Philipp Sadler, Sherzod Hakimov, Alberto Testoni, Raffaella Bernardi, Raquel Fernández, Alexander Koller, Oliver Lemon, David Schlangen, Mario Giulianelli, Alessandro Suglia. 29854-29891 [doi]
- GenLink: Generation-Driven Schema-Linking via Multi-Model Learning for Text-to-SQLZhifeng Hao, Junqi Huang, Shaobin Shi, Ruichu Cai, Boyan Xu. 29892-29905 [doi]
- TSVer: A Benchmark for Fact Verification Against Time-Series EvidenceMarek Strong, Andreas Vlachos 0001. 29906-29926 [doi]
- Cross-MoE: An Efficient Temporal Prediction Framework Integrating Textual ModalityRuizheng Huang, Zhicheng Zhang, Yong Wang. 29927-29938 [doi]
- Sparse Autoencoder Features for Classifications and TransferabilityJack Gallifant, Shan Chen 0004, Kuleen Sasse, Hugo J. W. L. Aerts, Thomas Hartvigsen, Danielle S. Bitterman. 29939-29963 [doi]
- KGE Calibrator: An Efficient Probability Calibration Method of Knowledge Graph Embedding Models for Trustworthy Link PredictionYang Yang 0008, Mohan Timilsina, Edward Curry. 29964-29987 [doi]
- LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language ModelsTakumi Shibata, Yuichi Miyamura. 29988-30001 [doi]
- The Arabic Generality Score: Another Dimension of Modeling Arabic DialectnessSanad Shaban, Nizar Habash. 30002-30013 [doi]
- Lemmatization as a Classification Task: Results from Arabic across Multiple GenresMostafa Saeed, Nizar Habash. 30014-30029 [doi]
- A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI EvaluationsAida Mostafazadeh Davani, Sunipa Dev, Héctor Pérez-Urbina, Vinodkumar Prabhakaran. 30030-30043 [doi]
- Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMsAmber Shore, Russell Scheinberg, Ameeta Agrawal, So Young Lee. 30044-30058 [doi]
- GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content DetectionMelissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower. 30059-30077 [doi]
- LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM AgentsTaro Yano, Yoichi Ishibashi, Masafumi Oyamada. 30078-30095 [doi]
- Finetuning LLMs for Human Behavior Prediction in Social Science ExperimentsAkaash Kolluri, Shengguang Wu, Joon-Sung Park, Michael S. Bernstein. 30096-30111 [doi]
- How Private are Language Models in Abstractive Summarization?Anthony Hughes, Nikolaos Aletras, Ning Ma 0002. 30112-30130 [doi]
- Expectation Preference Optimization: Reliable Preference Estimation for Improving the Reasoning Capability of Large Language ModelsZelin Li, Dawei Song. 30131-30146 [doi]
- Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMsSruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong 0001, Mahdi Namazifar. 30147-30166 [doi]
- Model Consistency as a Cheap yet Predictive Proxy for LLM Elo ScoresAshwin Ramaswamy, Nestor Demeure, Ermal Rrapaj. 30167-30175 [doi]
- Plutus: Benchmarking Large Language Models in Low-Resource Greek FinanceXueqing Peng, Triantafillos Papadopoulos, Efstathia Soufleri, Polydoros Giannouris, Ruoyu Xiang, Yan Wang 0015, Lingfei Qian, Jimin Huang, Qianqian Xie, Sophia Ananiadou. 30176-30202 [doi]
- TaxoAlign: Scholarly Taxonomy Generation Using Language ModelsAvishek Lahiri, Yufang Hou, Debarshi Kumar Sanyal. 30203-30223 [doi]
- DiNaM: Disinformation Narrative Mining with Large Language ModelsWitold Sosnowski, Arkadiusz Modzelewski, Kinga Skorupska, Adam Wierzbicki. 30224-30251 [doi]
- VeriLocc: End-to-End Cross-Architecture Register Allocation via LLMLeSheng Jin, Zhenyuan Ruan, Haohui Mai, Jingbo Shang. 30252-30262 [doi]
- MemeIntel: Explainable Detection of Propagandistic and Hateful MemesMohamed Bayan Kmainasi, Abul Hasnat 0001, Md. Arid Hasan, Ali Ezzat Shahroor, Firoj Alam. 30263-30279 [doi]
- FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and KoreanSeoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, Hansaem Kim. 30280-30294 [doi]
- Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation FrameworkMohna Chakraborty, Lu Wang 0008, David Jurgens. 30295-30323 [doi]
- VerIF: Verification Engineering for Reinforcement Learning in Instruction FollowingHao Peng 0015, Yunjia Qi, Xiaozhi Wang, Bin Xu 0001, Lei Hou 0001, Juanzi Li. 30324-30339 [doi]
- UNCLE: Benchmarking Uncertainty Expressions in Long-Form GenerationRuihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Dong Yu 0001, Nigel Collier, Deqing Yang. 30340-30356 [doi]
- Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric ReasoningMassimiliano Pronesti, Michela Lorandi, Paul Flanagan, Oisin Redmond, Anya Belz, Yufang Hou 0001. 30357-30373 [doi]
- Context-aware Biases for Length ExtrapolationAli Veisi, Hamidreza Amirzadeh, Amir Mansourian. 30374-30395 [doi]
- AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-ScientistsYifei Li 0005, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue 0001, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao 0001, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun 0001. 30396-30418 [doi]
- Finding your MUSE: Mining Unexpected Solutions EngineNir Sweed, Hanit Hakim, Ben Wolfson, Hila Lifshitz, Dafna Shahaf. 30419-30434 [doi]
- Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMsYao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, Pan Li 0001. 30435-30458 [doi]
- Leveraging Knowledge Graph-Enhanced LLMs for Context-Aware Medical ConsultationSu-Hyeong Park, Ho-Beom Kim, Seong-Jin Park, Dinara Aliyeva, Kang Min Kim. 30459-30475 [doi]
- Reflective Agreement: Combining Self-Mixture of Agents with a Sequence Tagger for Robust Event ExtractionFatemeh Haji, Mazal Bethany, Cho-Yu Jason Chiang, Anthony Rios, Peyman Najafirad. 30476-30492 [doi]
- Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty QuantificationMaya Kruse, Majid Afshar, Saksham Khatwani, Anoop M. Mayampurath, Guanhua Chen, Yanjun Gao. 30493-30504 [doi]
- Exploring morphology-aware tokenization: A case study on Spanish language modelingAlba Táboas García, Piotr Przybyla, Leo Wanner. 30505-30518 [doi]
- Studying Rhetorically Ambiguous QuestionsOghenevovwe Ikumariegbe, Eduardo Blanco 0002, Ellen Riloff. 30519-30529 [doi]
- Estimating LLM Consistency: A User Baseline vs Surrogate MetricsXiaoyuan Wu, Weiran Lin, Omer Akgul, Lujo Bauer. 30530-30544 [doi]
- Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark StudyDonggeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu. 30545-30588 [doi]
- Improving Rule-based Reasoning in LLMs using Neurosymbolic RepresentationsVarun Dhanraj, Chris Eliasmith. 30589-30608 [doi]
- Can LLMs Extract Frame-Semantic Arguments?Jacob Daniel Devasier, Rishabh Mediratta, Chengkai Li 0001. 30609-30622 [doi]
- Accelerated Test-Time Scaling with Model-Free Speculative SamplingWoomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati. 30623-30636 [doi]
- Enhancing RLHF with Human Gaze ModelingKarim Galliamov, Ivan Titov, Ilya Pershin. 30637-30643 [doi]
- Mapping semantic networks to Dutch word embeddings as a diagnostic tool for cognitive declineMaithe van Noort, Michal Korenar, Jelke Bloem. 30644-30659 [doi]
- CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language ModelsAneesh Komanduri, Karuna Bhaila, Xintao Wu. 30660-30680 [doi]
- Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd SimulationsYunzhe Wang, Gale M. Lucas, Burcin Becerik-Gerber, Volkan Ustun. 30681-30698 [doi]
- Are Language Models Consequentialist or Deontological Moral Reasoners?Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin 0001. 30699-30726 [doi]
- PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent ClaimsYongmin Yoo, Qiongkai Xu, Longbing Cao. 30727-30746 [doi]
- All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other TokensSiddarth Mamidanna, Daking Rai, Ziyu Yao 0002, Yilun Zhou. 30747-30760 [doi]
- A Position Paper on the Automatic Generation of Machine Learning LeaderboardsRoelien C. Timmer, Yufang Hou 0001, Stephen Wan 0001. 30761-30784 [doi]
- SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language ModelsAmirHossein Dabiri Aghdam, Lele Wang. 30785-30806 [doi]
- SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language ModelsThong Nguyen 0004, Yibin Lei, Jia-Huei Ju, Andrew Yates. 30807-30822 [doi]
- Meta-Semantics Augmented Few-Shot Relational LearningHan Wu, Jie Yin. 30823-30835 [doi]
- ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction TuningRui Wang 0095, BoHao Li, Xiyang Dai, Jianwei Yang, Yi-ling Chen, Zhen Xing, Yifan Yang 0005, Dongdong Chen 0001, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang 0001. 30836-30849 [doi]
- ModelCitizens: Representing Community Voices in Online SafetyAshima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel. 30850-30866 [doi]
- UnifiedVisual: A Framework for Constructing Unified Vision-Language DatasetsPengyu Wang 0006, Shaojun Zhou, Chenkun Tan, Xinghao Wang, Wei Huang, Zhen Ye 0006, Zhaowei Li, Botian Jiang, Dong Zhang, Xipeng Qiu. 30867-30899 [doi]
- The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue SupportSuhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Christopher W. Wiese, Saeed Abdullah. 30900-30922 [doi]
- Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document UnderstandingZirui Shao, Feiyu Gao, Zhaoqing Zhu, Chuwei Luo, Hangdi Xing, Zhi Yu, Qi Zheng 0002, Ming Yan 0008, Jiajun Bu. 30923-30944 [doi]
- AutoCT: Automating Interpretable Clinical Trial Prediction with LLM AgentsFengze Liu, Haoyu Wang 0005, Joonhyuk Cho, Dan Roth 0001, Andrew Lo. 30945-30970 [doi]
- MMDocIR: Benchmarking Multimodal Retrieval for Long DocumentsKuicai Dong, Yujing Chang, Derrick-Goh-Xin Deik, Dexun Li, Ruiming Tang, Yong Liu 0020. 30971-31005 [doi]
- Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative RetrievalSubhendu Khatuya, Shashwat Naidu, Pawan Goyal 0002, Niloy Ganguly. 31006-31018 [doi]
- Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered EnvironmentsMuhammad Ali, Salman Khan. 31019-31032 [doi]
- Demystifying Domain-adaptive Post-training for Financial LLMsZixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty. 31033-31059 [doi]
- HICode: Hierarchical Inductive Coding with LLMsMian Zhong, Pristina Wang, Anjalie Field. 31060-31078 [doi]
- Cacheback: Speculative Decoding With Nothing But CacheZhiyao Ma, In Gim, Lin Zhong 0001. 31079-31084 [doi]
- MA-DPR: Manifold-aware Distance Metrics for Dense Passage RetrievalYifan Liu, Qianfeng Wen, Mark Zhao, Jiazhou Liang, Scott Sanner. 31085-31103 [doi]
- LLM-Guided Co-Training for Text ClassificationMd Mezbaur Rahman, Cornelia Caragea. 31104-31121 [doi]
- LeanK: Learnable K Cache Channel Pruning for Efficient DecodingYike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang 0001, Jianyong Wang 0001, Lili Qiu. 31122-31137 [doi]
- DELOC: Document Element LocalizerHammad A. Ayyubi, Puneet Mathur, Md. Mehrab Tanjim, Vlad I. Morariu. 31138-31147 [doi]
- NL2Lean: Translating Natural Language into Lean 4 through Multi-Aspect Reinforcement LearningYue Fang 0001, Shaohan Huang, Xin Yu, Haizhen Huang, Zihan Zhang, Weiwei Deng, Furu Wei, Feng Sun 0008, Qi Zhang 0066, Zhi Jin 0001. 31148-31158 [doi]
- A Multilingual, Culture-First Approach to Addressing Misgendering in LLM ApplicationsSunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, Si-Qing Chen. 31159-31183 [doi]
- X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought ReasoningPrasanna Reddy Pulakurthi, Jiamian Wang, Majid Rabbani, Sohail A. Dianat, Raghuveer Rao, Zhiqiang Tao. 31184-31195 [doi]
- Token-level Proximal Policy Optimization for Query GenerationYichen Ouyang, Lu Wang 0029, Fangkai Yang, Pu Zhao 0004, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang 0001, Yuefeng Zhan, Hao Sun 0015, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang 0001, Feng Sun 0008. 31196-31210 [doi]
- Prior Prompt Engineering for Reinforcement Fine-TuningPittawat Taveekitworachai, Potsawee Manakul, Sarana Nutanong, Kunat Pipatanakul. 31211-31236 [doi]
- Beyond WER: Probing Whisper's Sub-token Decoder Across Diverse Language Resource LevelsSiyu Liang, Nicolas Ballier, Gina-Anne Levow, Richard A. Wright. 31237-31247 [doi]
- ThinkTuning: Instilling Cognitive Reflections without DistillationAswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou. 31248-31262 [doi]
- Droid: A Resource Suite for AI-Generated Code DetectionDaniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov. 31263-31289 [doi]
- LoRACoE: Improving Large Language Model via Composition-based LoRA ExpertGuanyu Li, Zhiheng Xi, Zhihao Zhang 0002, Boyang Hong, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 31290-31304 [doi]
- Same Question, Different Words: A Latent Adversarial Framework for Prompt RobustnessTingchen Fu, Fazl Barez. 31305-31319 [doi]
- Pluralistic Alignment for Healthcare: A Role-Driven FrameworkJiayou Zhong, Anudeex Shetty, Chao Jia, Xuanrui Lin, Usman Naseem. 31320-31343 [doi]
- Flexible-length Text Infilling for Discrete Diffusion ModelsAndrew Zhang, Anushka Sivakumar, Chia-Wei Tang, Chris Thomas 0001. 31344-31359 [doi]
- Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model DiffingSabri Boughorbel, Fahim Dalvi, Nadir Durrani, Majd Hawasly. 31360-31371 [doi]
- Explicit Learning and the LLM in Machine TranslationMalik Marmonier, Rachel Bawden, Benoît Sagot. 31372-31422 [doi]
- Towards Language-Agnostic STIPA: Universal Phonetic Transcription to Support Language Documentation at ScaleJacob Lee Suchardt, Hana El-Shazli, Pierluigi Cassotti. 31423-31439 [doi]
- Beyond Pairwise: Global Zero-shot Temporal Graph GenerationAlon Eirew, Kfir Bar, Ido Dagan. 31440-31458 [doi]
- "Feels Feminine to Me": Understanding Perceived Gendered Style through Human AnnotationsHongyu Chen, Neele Falk, Michael Roth, Agnieszka Falenska. 31459-31480 [doi]
- RALS: Resources and Baselines for Romanian Automatic Lexical SimplificationFabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, Sergiu Nisioi. 31481-31492 [doi]
- How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and AnalysisHerun Wan, Minnan Luo, Zihan Ma 0001, Guang Dai, Xiang Zhao 0002. 31493-31516 [doi]
- Are Stereotypes Leading LLMs' Zero-Shot Stance Detection ?Anthony Dubreuil, Antoine Gourru, Christine Largeron, Amine Trabelsi. 31517-31530 [doi]
- Multi-Modal Framing Analysis of NewsArnav Arora, Srishti Yadav, Maria Antoniak, Serge J. Belongie, Isabelle Augenstein. 31531-31553 [doi]
- TempParaphraser: "Heating Up" Text to Evade AI-Text Detection through ParaphrasingJunjie Huang, Ruiquan Zhang, Jinsong Su, Yidong Chen. 31554-31573 [doi]
- ComicScene154: A Scene Dataset for Comic AnalysisSandro Paval, Pascal Meißner, Ivan P. Yamshchikov. 31574-31580 [doi]
- MedLinkDE - MedDRA Entity Linking for German with Guided Chain of Thought ReasoningRoman Christof, Farnaz Zeidi, Manuela Messelhäußer, Dirk Mentzer, Renate König, Liam Harold Childs, Alexander Mehler. 31581-31593 [doi]
- HookMoE: A learnable performance compensation strategy of Mixture-of-Experts for LLM inference accelerationLongkai Cheng, Along He, Mulin Li, Xueshuo Xie, Tao Li 0022. 31594-31606 [doi]
- Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability PredictionMengying Yuan, Wenhao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji. 31607-31629 [doi]
- 3R: Enhancing Sentence Representation Learning via Redundant Representation ReductionLongxuan Ma, Xiao Wu, Yuxin Huang, Shengxiang Gao, Zhengtao Yu. 31630-31643 [doi]
- When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMsAbhirama Subramanyam Penamakuri, Navlika Singh, Piyush Arora, Anand Mishra 0001. 31644-31661 [doi]
- ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and WisdomJingqi Zhou, Sheng Wang, Jingwei Dong, Kai Liu, Lei Li, Jiahui Gao, Jiyue Jiang, Lingpeng Kong, Chuan Wu. 31662-31691 [doi]
- Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward PassNicholas Popovic, Michael Färber 0001. 31692-31705 [doi]
- Structure-Conditional Minimum Bayes Risk DecodingBryan Eikema, Anna Rutkiewicz, Mario Giulianelli. 31706-31723 [doi]
- Label Set Optimization via Activation Distribution Kurtosis for Zero-Shot Classification with Generative ModelsYue Li, Zhixue Zhao, Carolina Scarton. 31724-31741 [doi]
- The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMsHinata Tezuka, Naoya Inoue. 31742-31792 [doi]
- VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics ExpressionsThu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, Taehwan Kim. 31793-31813 [doi]
- All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model ReasoningCaiqi Zhang, Chang Shu, Ehsan Shareghi, Nigel Collier. 31814-31824 [doi]
- SEMMA: A Semantic Aware Knowledge Graph Foundation ModelArvindh Arun, Sumit Kumar, Mojtaba Nayyeri, Bo Xiong 0001, Ponnurangam Kumaraguru, Antonio Vergari, Steffen Staab. 31825-31848 [doi]
- Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from TextMizanur Rahman, Md. Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque. 31849-31874 [doi]
- Predicting Prosodic Boundaries for Children's TextsMansi Dhamne, Sneha Raman, Preeti Rao. 31875-31885 [doi]
- Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process SupervisionXingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Maria Liakata, Nikolaos Aletras. 31886-31900 [doi]
- Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment TechniquePiotr Sawicki 0001, Marek Grzes, Dan Brown 0001, Fabrício Góes. 31901-31918 [doi]
- Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume ParsingZijian Ling, Han Zhang 0027, Jiahao Cui, Zhequn Wu, Xu Sun 0002, Guohao Li, Xiangjian He. 31919-31945 [doi]
- Orthogonal Finetuning Made ScalableZeju Qiu, Weiyang Liu, Adrian Weller, Bernhard Schölkopf. 31946-31963 [doi]
- AIR: Complex Instruction Generation via Automatic Iterative RefinementWei Liu, Yancheng He, Yu Li, Hui Huang 0021, Chengwei Hu, Jiaheng Liu, Shilong Li, Wenbo Su, Bo Zheng 0007. 31964-31986 [doi]
- SQUiD: Synthesizing Relational Databases from Unstructured TextMushtari Sadia, Zhenning Yang, Yunming Xiao, Ang Chen 0001, Amrita Roy Chowdhury 0001. 31987-32012 [doi]
- RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware ReasoningYu Wang 0093, Shiwan Zhao, Zhihu Wang, Ming Fan 0002, Xicheng Zhang, Yubo Zhang 0006, Zhengfan Wang, Heyuan Huang, Ting Liu 0002. 32013-32037 [doi]
- Rapid Word Learning Through Meta In-Context LearningWentao Wang, Guangyuan Jiang, Tal Linzen, Brenden M. Lake. 32038-32073 [doi]
- EuroGEST: Investigating gender stereotypes in multilingual language modelsJacqueline Rowe, Mateusz Klimaszewski, Liane Guillou, Shannon Vallor, Alexandra Birch. 32074-32096 [doi]
- How Persuasive Is Your Context?Tu Nguyen, Kevin Du, Alexander Miserlis Hoyle, Ryan Cotterell. 32097-32123 [doi]
- The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept ErasureYu Fan 0007, Yang Tian, Shauli Ravfogel, Mrinmaya Sachan, Elliott Ash, Alexander Miserlis Hoyle. 32124-32143 [doi]
- Measuring scalar constructs in social science with LLMsHauke Licht, Rupak Sarkar, Patrick Y. Wu, Pranav Goel 0001, Niklas Stoehr, Elliott Ash, Alexander Miserlis Hoyle. 32144-32171 [doi]
- Text Detoxification: Data Efficiency, Semantic Preservation and Model GeneralizationJing Yu, Yibo Zhao 0005, Jiapeng Zhu 0002, Wenming Shao, Bo Pang, Zhao Zhang 0011, Xiang Li 0067. 32172-32186 [doi]
- Not What the Doctor Ordered: Surveying LLM-based De-identification and Quantifying Clinical Information LossKiana Aghakasiri, Noopur Zambare, JoAnn Thai, Carrie Ye, Mayur Mehta, J. Ross Mitchell, Mohamed Abdalla. 32187-32203 [doi]
- Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive SamplingZhenning Shi, Yijia Zhu, Yi Xie, Junhan Shi, Guorui Xie, Haotian Zhang, Yong Jiang 0001, Congcong Miao, Qing Li 0006. 32204-32218 [doi]
- Africa Health Check: Probing Cultural Bias in Medical LLMsCharles Nimo, Shuheng Liu, Irfan Essa, Michael L. Best. 32219-32232 [doi]
- Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational TermsOrfeas Menis-Mastromichalakis, Giorgos Filandrianos, Maria Symeonaki, Giorgos Stamou. 32233-32249 [doi]
- REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model DiffingAly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi. 32250-32263 [doi]
- ToM-SSI: Evaluating Theory of Mind in Situated Social InteractionsMatteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling. 32264-32289 [doi]
- Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?Grgur Kovac, Jérémy Perez, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer. 32290-32309 [doi]
- Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable QuestionsHazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr 0001, Yarin Gal. 32310-32322 [doi]
- Extending Automatic Machine Translation Evaluation to Book-Length DocumentsKuang-Da Wang, Shuoyang Ding, Chao-Han Huck Yang, Ping-Chun Hsieh, Wen-Chih Peng, Vitaly Lavrukhin, Boris Ginsburg. 32323-32339 [doi]
- MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM ResponsesTong Chen 0005, Zimu Wang, Yiyi Miao, Haoran Luo, Yuanfei Sun, Wei Wang 0042, Zhengyong Jiang, Procheta Sen, Jionglong Su. 32340-32353 [doi]
- VideoPASTA: 7K Preference Pairs That Matter for Video-LLM AlignmentYogesh Kulkarni, Pooyan Fazli. 32354-32379 [doi]
- Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label DefinitionsSeyedali Mohammadi, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru, Francis Ferraro, Manas Gaur. 32380-32393 [doi]
- Group-Aware Reinforcement Learning for Output Diversity in Large Language ModelsOron Anschel, Alon Shoshan, Adam Botach, Shunit Haviv Hakimi, Asaf Gendler, Emanuel Ben Baruch, Nadav Bhonker, Igor Kviatkovsky, Manoj Aggarwal, Gérard G. Medioni. 32394-32415 [doi]
- Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual TransferAbteen Ebrahimi, Adam Wiemerslage, Katharina von der Wense. 32416-32461 [doi]
- PruneCD: Contrasting Pruned Self Model to Improve Decoding FactualityByeongho Yu, Changhun Lee, Jungyu Jin, Eunhyeok Park. 32462-32473 [doi]
- Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive DialoguesJinfeng Zhou, Yuxuan Chen, Jianing Yin, Yongkang Huang, Yihan Shi, Xikun Zhang 0008, Libiao Peng, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, Hongning Wang, Minlie Huang. 32474-32503 [doi]
- AccessEval: Benchmarking Disability Bias in Large Language ModelsSrikant Panda, Amit Agarwal, Hitesh Laxmichand Patel. 32504-32530 [doi]
- The Impact of Language Mixing on Bilingual LLM ReasoningYihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle H. Ungar. 32531-32548 [doi]
- VISaGE: Understanding Visual Generics and ExceptionsStella Frank, Emily Allaway. 32549-32558 [doi]
- Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language ModelsAlex Laitenberger, Christopher D. Manning, Nelson F. Liu. 32559-32569 [doi]
- Discursive Circuits: How Do Language Models Understand Discourse Relations?Yisong Miao, Min-Yen Kan. 32570-32589 [doi]
- Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural ReasoningChan Young Park, Jillian Fisher, Marius Memmel, Dipika Khullar, Seoho Yun, Abhishek Gupta, Yejin Choi. 32590-32611 [doi]
- ThinkSLM: Towards Reasoning in Small Language ModelsGaurav Srivastava, Shuxiang Cao, Xuan Wang. 32612-32662 [doi]
- MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for ReasoningJustin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal. 32663-32686 [doi]
- Batched Self-Consistency Improves LLM Relevance Assessment and RankingAnton Korikov, Pan Du, Scott Sanner, Navid Rekabsaz. 32687-32703 [doi]
- SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific AbstractsMarc Felix Brinner, Sina Zarrieß. 32704-32719 [doi]
- Controlled Generation for Private Synthetic TextZihao Zhao, Anjalie Field. 32720-32735 [doi]
- Towards AI-Assisted Psychotherapy: Emotion-Guided Generative InterventionsKilichbek Haydarov, Youssef Mohamed, Emilio Goldenhersch, Paul OCallaghan, Li-Jia Li, Mohamed Elhoseiny. 32736-32755 [doi]
- From Shortcuts to Balance: Attribution Analysis of Speech-Text Feature Utilization in Distinguishing Original from Machine-Translated TextsYongjian Chen, Antonio Toral. 32756-32763 [doi]
- DEBATE, TRAIN, EVOLVE: Self-Evolution of Language Model ReasoningGaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang 0008. 32764-32810 [doi]
- From Chat Logs to Collective Insights: Aggregative Question AnsweringWentao Zhang, Woojeong Kim, Yuntian Deng. 32811-32850 [doi]
- A Text-Based Recommender System that Leverages Explicit Affective State PreferencesTonmoy Hasan, Razvan C. Bunescu. 32851-32865 [doi]
- CARE: Multilingual Human Preference Learning for Cultural AwarenessGeyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu 0004. 32866-32895 [doi]
- Multilingual Dialogue Generation and Localization with Dialogue Act ScriptingJustin Vasselli, Eunike Andriani Kardinata, Yusuke Sakai 0010, Taro Watanabe. 32896-32911 [doi]
- SUE: Sparsity-based Uncertainty Estimation via Sparse Dictionary LearningTamás Ficsor, Gábor Berend. 32912-32929 [doi]
- Planning-Aware Code Infilling via Horizon-Length PredictionYifeng Ding, Hantian Ding, Shiqi Wang 0002, Qing Sun 0013, Varun Kumar, Zijian Wang 0002. 32930-32942 [doi]
- SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in SinhalaAshmari Pramodya, Nirasha Nelki, Heshan Shalinda, Chamila Liyanage, Yusuke Sakai 0010, Randil Pushpananda, Ruvan Weerasinghe, Hidetaka Kamigaito, Taro Watanabe. 32943-32961 [doi]
- OG-RAG: Ontology-grounded retrieval-augmented generation for large language modelsKartik Sharma, Peeyush Kumar, Yunqing Li. 32962-32981 [doi]
- Convergence and Divergence of Language Models under Different Random SeedsFinlay Fehlauer, Kyle Mahowald, Tiago Pimentel. 32982-32991 [doi]
- Analyzing and Modeling LLM Response Lengths with Extreme Value Theory: Anchoring Effects and Hybrid DistributionsLiuxuan Jiao, Chen Gao, Yiqian Yang, Chenliang Zhou, Yixian Huang, Xinlei Chen, Yong Li. 32992-33002 [doi]
- Language Models Identify Ambiguities and Exploit LoopholesJio Choi, Mohit Bansal, Elias Stengel-Eskin. 33003-33018 [doi]
- Benchmarking LLMs for Translating Classical Chinese Poetry: Evaluating Adequacy, Fluency, and EleganceAndong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai 0001, Yang Xiang 0003, Muyun Yang, Tiejun Zhao, Min Zhang 0005. 33019-33036 [doi]
- AraEval: An Arabic Multi-Task Evaluation Suite for Large Language ModelsAlhanoof Althnian, Norah A. Alzahrani, Shaykhah Z. Alsubaie, Eman Albilali, Ahmed Abdelali, Nouf M. Alotaibi, M. Saiful Bari, Yazeed Alnumay, Abdulhamed Alothaimen, Maryam Saif, Shahad D. Alzaidi, Faisal Abdulrahman Mirza, Yousef Almushayqih, Mohammed Al Saleem, Ghadah Alabduljabbar, AbdulMohsen Al-Thubaity, Areeb Alowisheq, Nora Al-Twairesh. 33037-33061 [doi]
- QUIDS: Query Intent Description for Exploratory Search via Dual Space ModelingYumeng Wang 0001, Xiuying Chen, Suzan Verberne. 33062-33077 [doi]
- A Systematic Survey of Automatic Prompt Optimization TechniquesKiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang, Haozhu Wang, Han Ding 0004, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen, Haibo Ding, Panpan Xu, Lin Lee Cheong. 33078-33110 [doi]
- Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label VariationBeiduo Chen, Yang Janet Liu, Anna Korhonen, Barbara Plank. 33111-33135 [doi]
- MemInsight: Autonomous Memory Augmentation for LLM AgentsRana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, Yassine Benajiba. 33136-33152 [doi]
- Breaking the Noise Barrier: LLM-Guided Semantic Filtering and Enhancement for Multi-Modal Entity AlignmentChenglong Lu, Chenxiao Li, Jingwei Cheng, Yongquan Ji, Guoqing Chen, Fu Zhang. 33153-33167 [doi]
- ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeZeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, Hinrich Schütze. 33168-33190 [doi]
- No Need for Explanations: LLMs can implicitly learn from mistakes in-contextLisa Alazraki, Maximilian Mozes, Jon Ander Campos, Yi Chern Tan, Marek Rei, Max Bartolo. 33191-33215 [doi]
- MoVa: Towards Generalizable Classification of Human Morals and ValuesZiyu Chen, Junfei Sun, Chenxi Li, Tuan-Dung Nguyen, Jing Yao 0003, Xiaoyuan Yi, Xing Xie 0001, Chenhao Tan, Lexing Xie. 33216-33260 [doi]
- GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous ExplorationYue Fan, Handong Zhao, Ruiyi Zhang 0002, Yu Shen, Xin Eric Wang, Gang Wu 0013. 33261-33278 [doi]
- Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-PlayingWenyuan Zhang 0002, Shuaiyi Nie, Jiawei Sheng, Zefeng Zhang 0001, Xinghua Zhang 0001, Yongquan He, Tingwen Liu. 33279-33302 [doi]
- Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue LearningJiazheng Liu, Sipeng Zheng, Börje F. Karlsson, Zongqing Lu 0002. 33303-33324 [doi]
- Graph-Based Multi-Trait Essay ScoringShengjie Li, Vincent Ng. 33325-33351 [doi]
- Benchmarking LLMs on Semantic Overlap SummarizationJohn Salvador, Naman Bansal, Mousumi Akter 0001, Souvika Sarkar, Anupam Das 0008, Shubhra Kanti Karmaker. 33352-33373 [doi]
- N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal VocalizationsSiddhant Bikram Shah, Kristina T. Johnson. 33374-33391 [doi]
- Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar InductionJinwook Park, Kangil Kim. 33392-33403 [doi]
- Spatial Layouts in News Homepages Capture Human PreferencesAlexander Spangher, Michael Vu, Arda Kaz, Naitian Zhou, Ben Welsh. 33404-33420 [doi]
- KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual ContextsTaebaek Hwang, Minseo Kim, Gisang Lee, Seonuk Kim, Hyunjun Eun. 33421-33432 [doi]
- ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State ReflectionJeonghye Kim, Sojeong Rhee, Minbeom Kim, Dohyung Kim, Sangmook Lee, Youngchul Sung, Kyomin Jung. 33433-33465 [doi]
- CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome RewardShudong Liu 0007, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang 0001, Kai Chen 0026. 33466-33494 [doi]
- A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-makingXiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie 0001. 33495-33512 [doi]
- Castle: Causal Cascade Updates in Relational Databases with Large Language ModelsYongye Su, Yucheng Zhang, Zeru Shi, Bruno Ribeiro 0001, Elisa Bertino. 33513-33525 [doi]
- Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case StudiesVishnu Raja, Adithya V. Ganesan, Anand Syamkumar, Ritwik Banerjee, H. Andrew Schwartz. 33526-33537 [doi]
- NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API CallsKinjal Basu 0002, Ibrahim Abdelaziz, Kiran Kate, Mayank Agarwal, Maxwell Crouse, Yara Rizk, Kelsey Bradford, Asim Munawar, Sadhana Kumaravel, Saurabh Goyal, Xin Wang, Luis A. Lastras, Pavan Kapanipathi. 33538-33547 [doi]
- Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language ModelsMd. Atabuzzaman, Ali Asgarov, Christopher Thomas 0004. 33548-33562 [doi]
- Can Large Language Models Unlock Novel Scientific Research Ideas?Sandeep Kumar 0009, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal. 33563-33587 [doi]
- Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-KnowinglyWenya Xie, Shaochen Zhong, Hoang Anh Duy Le, Zhaozhuo Xu, Jianwen Xie, Zirui Liu 0001. 33588-33598 [doi]
- DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian ContextPramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar. 33599-33626 [doi]
- SYNC: A Synthetic Long-Context Understanding Benchmark for Controlled Comparisons of Model CapabilitiesShuyang Cao, Kaijian Zou, Lu Wang. 33627-33648 [doi]
- OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ LanguagesChester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, Constantine Lignos. 33649-33674 [doi]
- Mondrian: A Framework for Logical Abstract (Re)StructuringElizabeth Orwig, Shinwoo Park, Hyundong Jin, Yo-Sub Han. 33675-33690 [doi]
- Case-Based Decision-Theoretic Decoding with Quality MemoriesHiroyuki Deguchi 0002, Masaaki Nagata. 33691-33706 [doi]
- PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought ProcessXinliang Frederick Zhang, Nicholas Beauchamp, Lu Wang 0008. 33707-33736 [doi]
- Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic EvaluationsAnanth Agarwal, Jasper Jian, Christopher D. Manning, Shikhar Murty. 33737-33757 [doi]
- Image Difference Captioning via Adversarial Preference OptimizationZihan Huang, Junda Wu, Rohan Surana, Tong Yu 0001, David Arbour, Ritwik Sinha, Julian J. McAuley. 33758-33770 [doi]
- seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMsMohammad Ramezanali, Mo Vazifeh, Paolo Santi. 33771-33792 [doi]
- NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation RecoveryMinki Hong, Jangho Choi, Jihie Kim. 33793-33831 [doi]
- SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT FormulasAnjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang 0022, Alex Aiken. 33832-33849 [doi]
- Data Descriptions from Large Language Models with Influence EstimationChaeri Kim, Jaeyeon Bae, Taehwan Kim. 33850-33867 [doi]
- EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence CheckingAnjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken. 33868-33881 [doi]
- MicroEdit: Neuron-level Knowledge Disentanglement and Localization in Lifelong Model EditingShiqi Wang 0006, Qi Wang 0078, Runliang Niu, He Kong 0004, Yi Chang 0001. 33882-33896 [doi]
- Do Large Language Models Understand Word Senses?Domenico Meconi, Simone Stirpe, Federico Martelli, Leonardo Lavalle, Roberto Navigli. 33897-33916 [doi]
- Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language ModelsVijeta Deshpande, Debasmita Ghose, John D. Patterson, Roger E. Beaty, Anna Rumshisky. 33917-33938 [doi]
- Uncovering the Bigger Picture: Comprehensive Event Understanding Via Diverse News RetrievalYixuan Tang, Yuanyuan Shi, Yiqun Sun, Anthony Kum Hoe Tung. 33939-33957 [doi]
- Personalized LLM Decoding via Contrasting Personal PreferenceHyungjune Bu, ChanJoo Jung, Minjae Kang, Jaehyung Kim. 33958-33978 [doi]
- The Missing Parts: Augmenting Fact Verification with Half Truth DetectionYixuan Tang, Jincheng Wang, Anthony Kum Hoe Tung. 33979-33996 [doi]
- Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect TranslationsYimin Xiao, Yongle Zhang 0004, Dayeon Ki, Calvin Bao, Marianna J. Martindale, Charlotte Vaughn, Ge Gao 0001, Marine Carpuat. 33997-34014 [doi]
- Personalization up to a Point: Why Personalized Content Moderation Needs Boundaries, and How We Can Enforce ThemEmanuele Moscato, Tiancheng Hu, Matthias Orlikowski, Paul Röttger, Debora Nozza. 34015-34029 [doi]
- MPCG: Multi-Round Persona-Conditioned Generation for Modeling the Evolution of Misinformation with LLMsJun Rong Brian Chong, Yixuan Tang, Anthony Kum Hoe Tung. 34030-34064 [doi]
- LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language InferencePingjun Hong, Beiduo Chen, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank. 34065-34085 [doi]
- LiteraryQA: Towards Effective Evaluation of Long-document Narrative QATommaso Bonomo, Luca Gioffrè, Roberto Navigli. 34086-34107 [doi]
- FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Insertion and Filler Style ControlSeung-bin Kim, Junhyeok Cha, Hyung-Seok Oh, Heejin Choi, Seong-Whan Lee. 34108-34125 [doi]
- Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Pérez, Silvia Paniagua Suárez, Anna Salles, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, Marta Villegas. 34126-34157 [doi]
- Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo QueryYixuan Wang, Shiyu Ji, Yijun Liu, Yuzhuang Xu, Yang Xu 0049, Qingfu Zhu, Wanxiang Che. 34158-34174 [doi]
- PerspectiveMod: A Perspectivist Resource for Deliberative ModerationEva Maria Vecchi, Neele Falk, Carlotta Quensel, Iman Jundi, Gabriella Lapesa. 34175-34198 [doi]
- LoCt-Instruct: An Automatic Pipeline for Constructing Datasets of Logical Continuous InstructionsHongyu Sun, Yusuke Sakai 0010, Haruki Sakajo, Shintaro Ozaki, Kazuki Hayashi, Hidetaka Kamigaito, Taro Watanabe. 34199-34218 [doi]
- CodeSSM: Towards State Space Models for Code UnderstandingShweta Verma, Abhinav Anand, Mira Mezini. 34219-34235 [doi]
- EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMsNumaan Naeem, Abdellah El Mekki, Muhammad Abdul-Mageed. 34236-34263 [doi]
- xCoRe: Cross-context Coreference ResolutionGiuliano Martinelli, Bruno Gatti, Roberto Navigli. 34264-34278 [doi]
- Retrieval-Augmented Generation with Estimation of Source ReliabilityJeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok. 34279-34303 [doi]
- NitiBench: Benchmarking LLM Frameworks on Thai Legal Question Answering CapabilitiesPawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Tasawong, Thitiwat Nopparatbundit, Keerakiat Pratai, Sarana Nutanong. 34304-34327 [doi]
- From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become ErrorsMaggie Mi, Aline Villavicencio, Nafise Sadat Moosavi. 34328-34341 [doi]
- Relations: Arabic Relation Extraction Corpus and ModelingAlaa Aljabari, Mohammed Khalilia, Mustafa Jarrar. 34342-34360 [doi]
- Conflicting Needles in a Haystack: How LLMs behave when faced with contradictory informationMurathan Kurfali, Robert Östling. 34361-34376 [doi]
- Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning ExtractionWenxuan Liu, Zixuan Li 0001, Long Bai 0002, Yuxin Zuo, Daozhu Xu, Xiaolong Jin 0001, Jiafeng Guo, Xueqi Cheng. 34377-34399 [doi]
- Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine TranslationSherrie Shen, Weixuan Wang, Alexandra Birch. 34400-34416 [doi]
- Concept-pedia: a Wide-coverage Semantically-annotated Multimodal DatasetKarim Ghonim, Andrei Stefan Bejgu, Alberte Fernández-Castro, Roberto Navigli. 34417-34438 [doi]
- RAED: Retrieval-Augmented Entity Description Generation for Emerging Entity Linking and DisambiguationKarim Ghonim, Pere-Lluís Huguet Cabot, Riccardo Orlando, Roberto Navigli. 34439-34452 [doi]
- Personalized Language Models via Privacy-Preserving Evolutionary Model MergingKyuyoung Kim, Jinwoo Shin, Jaehyung Kim. 34453-34468 [doi]
- Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During ListeningPadakanti Srijith, Khushbu Pahwa, Radhika Mamidi, Bapi Raju Surampudi, Manish Gupta 0001, Subba Reddy Oota. 34469-34486 [doi]
- STARQA: A Question Answering Dataset for Complex Analytical Reasoning over Structured DatabasesMounica Maddela, Lingjue Xie, Daniel Preotiuc-Pietro, Mausam. 34487-34499 [doi]
- Slim-SC: Thought Pruning for Efficient Scaling with Self-ConsistencyColin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov. 34500-34517 [doi]
- Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning TransitionChenxin An, Zhihui Xie 0002, Xiaonan Li, Ming Zhong 0005, Shansan Gong, Lei Li 0039, Jun Zhang 0003, Jingjing Xu 0001, Lingpeng Kong. 34518-34534 [doi]
- Exploring Large Language Models for Detecting Mental DisordersGleb Kuzmin, Petr Strepetov, Maksim Stankevich, Natalya V. Chudova, Artem Shelmanov, Ivan V. Smirnov. 34535-34559 [doi]
- Efficient Real-time Refinement of Language Model Text GenerationJoonho Ko, Jinheon Baek, Sung Ju Hwang. 34560-34573 [doi]
- Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMsDaehoon Gwak, Minseo Jung, Junwoo Park, Minho Park 0003, ChaeHun Park, Junha Hyung, Jaegul Choo. 34574-34594 [doi]
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive ContextsEsra Dönmez, Maximilian Maurer, Gabriella Lapesa, Agnieszka Falenska. 34595-34626 [doi]
- TounsiBench: Benchmarking Large Language Models for Tunisian ArabicSouha Hassine, Asma Arrak, Marouene Addhoum, Steven R. Wilson 0001. 34627-34642 [doi]
- Moral Framing in Politics (MFiP): A new resource and models for moral framingInes Rehbein, Ines Reinig, Simone Paolo Ponzetto. 34643-34663 [doi]
- ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social MediaAakash Kumar Agarwal, Saprativa Bhattacharjee, Mauli Rastogi, Jemima Jacob, Biplab Banerjee, Rashmi Gupta, Pushpak Bhattacharyya. 34664-34682 [doi]
- iKnow-audio: Integrating Knowledge Graphs with Audio-Language ModelsMichel Olvera, Changhong Wang, Paraskevas Stamatiadis, Gaël Richard, Slim Essid. 34683-34700 [doi]
- EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture VideosSourjyadip Ray, Shubham Sharma, Somak Aditya, Pawan Goyal 0002. 34701-34727 [doi]
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMsDenis Janiak, Jakub Binkowski, Albert Sawczyn, Bogdan Gabrys, Ravid Shwartz-Ziv, Tomasz Kajdanowicz. 34728-34745 [doi]
- Turning Logic Against Itself: Probing Model Defenses Through Contrastive QuestionsRachneet Singh Sachdeva, Rima Hazra, Iryna Gurevych. 34746-34776 [doi]
- CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text RecognitionSina J. Semnani, Han Zhang, Xinyan He, Merve Tekgurler, Monica Lam 0001. 34777-34824 [doi]
- Towards Author-informed NLP: Mind the Social BiasInbar Pendzel, Einat Minkov. 34825-34838 [doi]
- Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language ModelsSina J. Semnani, Jirayu Burapacheep, Arpandeep Khatua, Thanawan Atchariyachanvanit, Zheng Wang, Monica S. Lam. 34839-34866 [doi]
- Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and DomainsJunghwan Kim, Haotian Zhang, David Jurgens. 34867-34892 [doi]
- DrFrattn: Directly Learn Adaptive Policy from Attention for Simultaneous Machine TranslationLibo Zhao, Jing Li, Ziqian Zeng. 34893-34906 [doi]
- The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech PathologyFagun Patel, Duc Q. Nguyen, Sang T. Truong, Jody Vaynshtok, Sanmi Koyejo, Nick Haber. 34907-34925 [doi]
- NormXLogit: The Head-on-Top Never LiesSina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar. 34926-34947 [doi]
- Doc2Chart: Intent-Driven Zero-Shot Chart Generation from DocumentsAkriti Jain 0001, Pritika Ramu, Aparna Garimella, Apoorv Saxena. 34948-34963 [doi]
- Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction AmplificationBoyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem 0001, Michael Backes 0001, Savvas Zannettou, Yang Zhang 0016. 34964-34976 [doi]
- FoREST: Frame of Reference Evaluation in Spatial Reasoning TasksTanawan Premsri, Parisa KordJamshidi. 34977-35003 [doi]
- Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Cross-Lingual Transfer in Sense-Aware TasksRoksana Goworek, Haim Dubossarsky. 35004-35029 [doi]
- Translating Domain-Specific Terminology in Typologically-Diverse Languages: A Study in Tax and Financial EducationArturo Oncevay, Elena Kochkina, Keshav Ramani, Toyin Aguda, Simerjot Kaur, Charese Smiley. 35030-35044 [doi]
- Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language ModelsTomohiro Sawada, Kartik Goyal. 35045-35058 [doi]
- Spectral Scaling Laws in Language Models: emphHow Effectively Do Feed-Forward Networks Use Their Latent Space?Nandan Kumar Jha, Brandon Reagen. 35059-35070 [doi]
- TLUE: A Tibetan Language Understanding Evaluation BenchmarkFan Gao, Cheng Huang, Yutong Liu, Nyima Tashi, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeng Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng 0001, Yongbin Yu, Hao Wang. 35071-35097 [doi]
- Retrieving Support to Rank Answers in Open-Domain Question AnsweringZeyu Zhang, Alessandro Moschitti, Thuy Vu. 35098-35105 [doi]
- Trojsten Benchmark: Evaluating LLM Problem-Solving in Slovak STEM Competition ProblemsAdam Zahradník, Marek Suppa. 35106-35121 [doi]
- BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTSAlexandre Costa Ferro Filho, Rafaello Virgilli, Lucas Alcântara Souza, Frederico Santos de Oliveira, Marcelo Henrique Lopes Ferreira, Daniel Tunnermann, Gustavo dos Reis Oliveira, Anderson da Silva Soares, Arlindo Rodrigues Galvão Filho. 35122-35127 [doi]
- A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMsShaona Ghosh, Amrita Bhattacharjee, Yftah Ziser, Christopher Parisien. 35128-35148 [doi]
- Statistical and Neural Methods for Hawaiian Orthography ModernizationJaden Kapali, Keaton Williamson, Winston Wu. 35149-35155 [doi]
- so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMsSriharsh Bhyravajjula, Melanie Walsh, Anna Preus, Maria Antoniak. 35156-35173 [doi]
- Certified Mitigation of Worst-Case LLM Copyright InfringementJingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme, Daniel Khashabi. 35174-35195 [doi]
- Quantifying Logical Consistency in Transformers via Query-Key AlignmentEduard Tulchinskii, Laida Kushnareva, Anastasia Voznyuk, Andrei Andriiainen, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov. 35196-35211 [doi]
- SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu 0004, Jianfeng Gao 0001. 35212-35290 [doi]
- CourtReasoner: Can LLM Agents Reason Like Judges?Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J. Shapiro, Arman Cohan. 35291-35306 [doi]
- Not Your Typical Government Tipline: LLM-Assisted Routing of Environmental Protection Agency Citizen TipsSharanya Majumder, Zehua Li 0001, Derek Ouyang, Kit T. Rodolfa, Elena Eneva, Julian Nyarko, Daniel E. Ho. 35307-35315 [doi]
- Retracing the Past: LLMs Emit Training Data When They Get LostMyeongseob Ko, Nikhil Reddy Billa, Adam Nguyen, Charles Fleming, Ming Jin 0002, Ruoxi Jia 0001. 35316-35337 [doi]
- Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech RepresentationsLinyang He, Qiaolin Wang, Xilin Jiang, Nima Mesgarani. 35338-35353 [doi]
- Current Semantic-change Quantification Methods Struggle with Discovery in the WildKhonzoda Umarova, Lillian Lee, Laerdon Kim. 35354-35367 [doi]
- Evaluating Large Language Models for Detecting AntisemitismJay Patel, Hrudayangam Mehta, Jeremy Blackburn. 35368-35397 [doi]
- D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question AnsweringGuangze Gao, Zixuan Li, Chunfeng Yuan, Jiawei Li, Wu Jianzhuo, Yuehao Zhang, Xiaolong Jin, Bing Li, Weiming Hu. 35398-35417 [doi]
- Towards Robust Mathematical ReasoningThang Luong, Dawsen Hwang, Hoang H. Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Junsu Kim, Garrett Bingham, Jonathan Lee 0002, Swaroop Mishra, Alex Zhai, Clara Huiyi Hu, Henryk Michalewski, Jimin Kim, Jeonghyun Ahn, Junhwi Bae, Xingyou Song, Trieu H. Trinh, Quoc V. Le, Junehyuk Jung. 35418-35442 [doi]
- Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Fine-tuningJunjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong 0001, Shi Han, Dongmei Zhang 0001, Surajit Chaudhuri. 35443-35460 [doi]
- Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from DocumentsAnkan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Aditya Vempaty, Prasenjit Dey, Ravi Kokku, Pawan Goyal 0002, Niloy Ganguly. 35461-35489 [doi]
- Argument Summarization and its Evaluation in the Era of Large Language ModelsMoritz Altemeyer, Steffen Eger, Johannes Daxenberger, Yanran Chen, Tim Altendorf, Philipp Cimiano, Benjamin Schiller. 35490-35511 [doi]
- Computational Analysis of Conversation Dynamics through Participant ResponsivityMargaret A. Hughes, Brandon Roy, Elinor Poole-Dayan, Deb Roy, Jad Kabbara. 35512-35531 [doi]
- AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language ModelsSangjun Lee, Seung-taek Woo, Jungyu Jin, Changhun Lee, Eunhyeok Park. 35532-35550 [doi]
- Beyond Averages: Learning with Annotator Disagreement in STSAlejandro Benito-Santos, Adrián Ghajari. 35551-35557 [doi]
- Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning TasksWenyang Hu, Gregory Kang Ruey Lau, Diwen Liu, Jizhuo Chen, See-Kiong Ng, Bryan Kian Hsiang Low. 35558-35572 [doi]
- Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority TopicsSeyedeh Fatemeh Ebrahimi, Jaakko Peltonen. 35573-35598 [doi]
- Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial LanguagesNadine El-Naggar, Tatsuki Kuribayashi, Ted Briscoe. 35599-35613 [doi]
- Training compute-optimal transformer encoder modelsMegi Dervishi, Alexandre Allauzen, Gabriel Synnaeve, Yann LeCun. 35614-35629 [doi]
- Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM ReviewsHyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim, Ji Yong Cho, Hwajung Hong, Moontae Lee, Juho Kim 0001. 35630-35656 [doi]
- Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language ModelsZoe Wanying He, Sean Trott, Meenakshi Khosla. 35657-35672 [doi]
- Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language ModelsArtem Vazhentsev, Ekaterina Fadeeva, Rui Xing 0002, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, Artem Shelmanov. 35673-35694 [doi]
- Chinese Toxic Language Mitigation via Sentiment Polarity Consistent RewritesXintong Wang 0001, Yixiao Liu, Jingheng Pan, Liang Ding 0006, Longyue Wang, Chris Biemann. 35695-35711 [doi]
- A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM OutputsArtem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin. 35712-35731 [doi]