Abstract is missing.
- Frontmatter [doi]
- EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product AssociationWeiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo 0003, Sheikh Muhammad Sarwar, Yang Li, Hansu Gu, Hui Liu, Changlong Yu, Jiaxin Bai, Yifan Gao 0001, Haiyang Zhang, Qi He, Shuiwang Ji, Yangqiu Song. 1-22 [doi]
- GraphNarrator: Generating Textual Explanations for Graph Neural NetworksBo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang 0047, Yifei Zhang 0006, Yuntong Hu, Liang Zhao 0002. 23-42 [doi]
- M-RewardBench: Evaluating Reward Models in Multilingual SettingsSrishti Gureja, Lester James Validad Miranda, Shayekh Bin Islam, Rishabh Maheshwary, Drishti Sharma, Gusti Triandi Winata, Nathan Lambert 0001, Sebastian Ruder, Sara Hooker, Marzieh Fadaee. 43-58 [doi]
- ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive ProgrammingXinwei Yang, Zhaofeng Liu, Chen Huang 0006, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei. 59-104 [doi]
- The Impossibility of Fair LLMsJacy Reese Anthis, Kristian Lum, Michael D. Ekstrand, Avi Feller, Chenhao Tan. 105-120 [doi]
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single ProcessErmo Hua, Biqing Qi, Kaiyan Zhang, Kai Tian, Xingtai Lv, Ning Ding 0002, Bowen Zhou 0002. 121-136 [doi]
- Bias in Language Models: Beyond Trick Tests and Towards RUTEd EvaluationKristian Lum, Jacy Reese Anthis, Kevin Robinson, Chirag Nagpal, Alexander Nicholas D'Amour. 137-161 [doi]
- Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language ModelsWenhan Liu, Xinyu Ma, Yutao Zhu 0001, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou. 162-176 [doi]
- The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate ItAaron Nicolson, Shengyao Zhuang, Jason Dowling, Bevan Koopman. 177-203 [doi]
- CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error CorrectionJingheng Ye, Zishan Xu, Yinghui Li, Linlin Song, Qingyu Zhou, Hai-Tao Zheng 0002, Ying Shen 0001, Wenhao Jiang, Hong-Gee Kim, Ruitong Liu, Xin Su, Zifei Shan. 204-222 [doi]
- StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich TextZhouhong Gu, Haoning Ye, Xingzhou Chen, Zeyang Zhou, Hongwei Feng, Yanghua Xiao. 223-244 [doi]
- Literature Meets Data: A Synergistic Approach to Hypothesis GenerationHaokun Liu, Yangqiaoyu Zhou, Mingxuan Li, Chenfei Yuan, Chenhao Tan. 245-281 [doi]
- GAPO: Learning Preferential Prompt through Generative Adversarial Policy OptimizationZhouhong Gu, Xingzhou Chen, Xiaoran Shi, Tao Wang, Suhang Zheng, Tianyu Li, Hongwei Feng, Yanghua Xiao. 282-296 [doi]
- Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language ModelsZiyang Luo, Kaixin Li, Hongzhan Lin 0001, Yuchen Tian, Mohan S. Kankanhalli, Jing Ma 0004. 297-316 [doi]
- Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language ModelsSeunguk Yu, Juhwan Choi, Youngbin Kim. 317-340 [doi]
- ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency SupervisionDosung Lee, Wonjun Oh, Boyoung Kim, Minyoung Kim, Joonsuk Park, Paul Hongsuck Seo. 341-359 [doi]
- FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language ModelsHongzhan Lin 0001, Yang Deng 0002, Yuxuan Gu, Wenxuan Zhang 0001, Jing Ma 0004, See-Kiong Ng, Tat-Seng Chua. 360-381 [doi]
- Statistical Deficiency for Task Inclusion EstimationLoïc Fosse, Frédéric Béchet, Benoît Favre, Géraldine Damnati, Gwénolé Lecorvé, Maxime Darrin, Philippe Formont, Pablo Piantanida. 382-415 [doi]
- Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous ClientsJabin Koo, Minwoo Jang, Jungseul Ok. 416-429 [doi]
- LLM-Powered Test Case Generation for Detecting Bugs in Plausible ProgramsKaibo Liu, Zhenpeng Chen, Yiyang Liu, Jie M. Zhang, Mark Harman, Yudong Han, Yun Ma 0002, Yihong Dong, Ge Li 0001, Gang Huang 0001. 430-440 [doi]
- Capture the Key in Reasoning to Enhance CoT Distillation GeneralizationChengwei Dai, Kun Li, Wei Zhou 0019, Songlin Hu 0001. 441-465 [doi]
- How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and BeyondChen Huang 0006, Yang Deng 0002, Wenqiang Lei, Jiancheng Lv 0001, Tat-Seng Chua, Jimmy Huang 0001. 466-488 [doi]
- Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion KnowledgeLi Zheng, Sihang Wang, Hao Fei 0001, Zuquan Peng, Fei Li 0021, Jianming Fu, Chong Teng, Donghong Ji. 489-499 [doi]
- UniICL: An Efficient ICL Framework Unifying Compression, Selection, and GenerationJun Gao, Qi Lv 0001, Zili Wang, Tianxiang Wu, Ziqiang Cao, Wenjie Li 0002. 500-510 [doi]
- BelarusianGLUE: Towards a Natural Language Understanding Benchmark for BelarusianMaksim Aparovich, Volha Harytskaya, Vladislav Poritski, Oksana Volchek, Pavel Smrz. 511-527 [doi]
- A Survey on Foundation Language Models for Single-cell BiologyFan Zhang, Hao Chen 0011, Zhihong Zhu, Ziheng Zhang, Zhenxi Lin, Ziyue Qiao, Yefeng Zheng 0001, Xian Wu 0001. 528-549 [doi]
- RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World ScenariosRuiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang. 550-572 [doi]
- Extending LLM Context Window with Adaptive Grouped Positional Encoding: A Training-Free MethodXinhao Xu, Jiaxin Li, Hui Chen 0013, Zijia Lin, Jungong Han, Guiguang Ding. 573-587 [doi]
- Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language ModelsSungjae Lee, Hyejin Park 0002, Jaechang Kim 0001, Jungseul Ok. 588-606 [doi]
- HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel RetrievalArian Askari, Emmanouil Stergiadis, Ilya Gusev, Moran Beladev. 607-619 [doi]
- Can Multimodal Large Language Models Understand Spatial Relations?JingPing Liu, Ziyan Liu, Zhedong Cen, Yan Zhou, Yinan Zou, Weiyan Zhang, Haiyun Jiang, Tong Ruan. 620-632 [doi]
- S³ - Semantic Signal SeparationMárton Kardos, Jan Kostkan, Kenneth C. Enevoldsen, Arnault-Quentin Vermillet, Kristoffer L. Nielbo, Roberta Rocca. 633-666 [doi]
- TrimLLM: Progressive Layer Dropping for Domain-Specific LLMsLanxiang Hu, Tajana Rosing, Hao Zhang 0025. 667-681 [doi]
- JuStRank: Benchmarking LLM Judges for System RankingAriel Gera, Odellia Boni, Yotam Perlitz, Roy Bar-Haim, Lilach Eden, Asaf Yehudai. 682-712 [doi]
- Generating Diverse Training Samples for Relation Extraction with Large Language ModelsZexuan Li, Hongliang Dai, Piji Li. 713-726 [doi]
- MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media TextsDominik Macko, Jakub Kopal, Róbert Móro, Ivan Srba. 727-752 [doi]
- Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided ReflectionCilin Yan, Jingyun Wang, Lin Zhang, Ruihui Zhao, Xiaopu Wu, Kai Xiong, Qingsong Liu, Guoliang Kang, Yangyang Kang. 753-779 [doi]
- Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation GenerationAneta Zugecova, Dominik Macko, Ivan Srba, Róbert Móro, Jakub Kopál, Katarina Marcincinova, Matús Mesarcík. 780-797 [doi]
- EscapeBench: Towards Advancing Creative Intelligence of Language Model AgentsCheng Qian 0008, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang 0002, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, Yunzhu Li, Heng Ji 0001. 798-820 [doi]
- BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem SolvingTeng Wang, Wing Yin Yu, Zhenqi He, Zehua Liu, HaileiGong HaileiGong, Han Wu 0004, Xiongwei Han, Wei Shi, Ruifeng She, Fangzhou Zhu, Tao Zhong. 821-838 [doi]
- LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data AugmentationJakub Smíd, Pavel Pribán, Pavel Král. 839-853 [doi]
- Fusing Highly Specialized Language Models for Comprehensive ExpertiseNing Ding 0002, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Kaiyan Zhang, Ruobing Xie, Bowen Zhou 0002, Zhiyuan Liu 0001, Maosong Sun 0001. 854-878 [doi]
- HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge BasesMeng-Chieh Lee, Qi Zhu 0008, Costas Mavromatis, Zhen Han, Soji Adeshina, Vassilis N. Ioannidis, Huzefa Rangwala, Christos Faloutsos. 879-893 [doi]
- Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media PlatformsRajvardhan Oak, Muhammad Haroon, Claire Wonjeong Jo, Magdalena Wojcieszak, Anshuman Chhabra. 894-908 [doi]
- Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical ReviewYidong Gan, Maciej Rybinski, Ben Hachey, Jonathan K. Kummerfeld. 909-922 [doi]
- MIND: A Multi-agent Framework for Zero-shot Harmful Meme DetectionZiyan Liu, Chunxiao Fan 0001, Haoran Lou, Yuexin Wu, Kaiwei Deng. 923-947 [doi]
- EvoWiki: Evaluating LLMs on Evolving KnowledgeWei Tang 0015, Yixin Cao 0006, Yang Deng 0002, Jiahao Ying, Bo Wang, Yizhe Yang, Yuyue Zhao, Qi Zhang 0001, Xuanjing Huang 0001, Yu-Gang Jiang 0001, Yong Liao. 948-964 [doi]
- Rethinking Repetition Problems of LLMs in Code GenerationYihong Dong, Yuchen Liu, Xue Jiang, Bin Gu, Zhi Jin, Ge Li. 965-985 [doi]
- PunchBench: Benchmarking MLLMs in Multimodal Punchline ComprehensionKun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou 0012, Fandong Meng, Jie Zhou 0016, Xu Sun 0001. 986-1008 [doi]
- ProcessBench: Identifying Process Errors in Mathematical ReasoningChujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu 0002, Dayiheng Liu, Jingren Zhou 0001, Junyang Lin. 1009-1024 [doi]
- Model Extrapolation Expedites AlignmentChujie Zheng, Ziqi Wang 0003, Heng Ji 0001, Minlie Huang, Nanyun Peng 0001. 1025-1041 [doi]
- ATLANTIS: Weak-to-Strong Learning via Importance SamplingYi Liu, Guoyin Wang, Shicheng Li, Feifan Song 0001, Xu Sun 0001. 1042-1052 [doi]
- MPVStance: Mitigating Hallucinations in Stance Detection with Multi-Perspective VerificationZhaodan Zhang, Zhao Zhang, Jin Zhang, Hui Xu, Xueqi Cheng. 1053-1067 [doi]
- Personality-Guided Code Generation Using Large Language ModelsYaoqi Guo, Zhenpeng Chen, Jie M. Zhang, Yang Liu 0003, Yun Ma 0002. 1068-1080 [doi]
- PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological CounselingHaojie Xie, Yirong Chen, Xiaofen Xing, Jingkai Lin, Xiangmin Xu. 1081-1115 [doi]
- BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation FrameworkXu Zou. 1116-1134 [doi]
- LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingChao Deng, Jiale Yuan, Pi Bu, Peijie Wang, Zhong-Zhi Li, Jian Xu 0015, Xiao-hui Li, Yuan Gao, Jun Song, Bo Zheng 0007, Cheng-Lin Liu 0001. 1135-1159 [doi]
- ObfusLM: Privacy-preserving Language Model Service against Embedding Inversion AttacksYu Lin, Ruining Yang, Yunlong Mao, Qizhi Zhang, Jue Hong, Quanwei Cai 0003, Ye Wu, Huiqi Liu, ZhiYu Chen, Bing Duan, Sheng Zhong 0002. 1160-1174 [doi]
- Interlocking-free Selective Rationalization Through Genetic-based LearningFederico Ruggeri, Gaetano Signorelli. 1175-1191 [doi]
- Re-identification of De-identified Documents with Autoregressive InfillingLucas Georges Gabriel Charpentier, Pierre Lison. 1192-1209 [doi]
- Modeling Uncertainty in Composed Image Retrieval via Probabilistic EmbeddingsHaomiao Tang, Jinpeng Wang 0002, Yuang Peng, Guanghao Meng, Ruisheng Luo, Bin Chen 0011, Long Chen 0016, Yaowei Wang 0001, Shutao Xia. 1210-1222 [doi]
- Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language ModelsJunfeng Tian, Da Zheng, Yang Chen, Rui Wang, Colin Zhang, Debing Zhang. 1223-1242 [doi]
- APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsHonghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si. 1243-1266 [doi]
- Evaluating Lexical Proficiency in Neural Language ModelsCristiano Ciaccio, Alessio Miaschi, Felice dell'Orletta. 1267-1286 [doi]
- Autoregressive Speech Synthesis without Vector QuantizationLingwei Meng, Long Zhou, Shujie Liu 0001, Sanyuan Chen, Bing Han 0008, Shujie Hu, Yanqing Liu, Jinyu Li 0001, Sheng Zhao, Xixin Wu, Helen M. Meng, Furu Wei. 1287-1300 [doi]
- Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's NestLetian Peng, Zilong Wang 0002, Feng Yao, Jingbo Shang. 1301-1315 [doi]
- FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language ModelsRaghav Singhal, Kaustubh Ponkshe, Praneeth Vepakomma. 1316-1336 [doi]
- Measuring Social Biases in Masked Language Models by Proxy of Prediction QualityRahul Zalkikar, Kanchan Chandra. 1337-1361 [doi]
- Capturing Author Self Beliefs in Social Media LanguageSiddharth Mangalik, Adithya V. Ganesan, Abigail B. Wheeler, Nicholas Kerry, Jeremy D. W. Clifton, H. Andrew Schwartz, Ryan L. Boyd. 1362-1376 [doi]
- Neural Topic Modeling with Large Language Models in the LoopXiaohao Yang, He Zhao 0001, Weijie Xu, Yuanyuan Qi, Jueqing Lu, Dinh Phung 0001, Lan Du 0002. 1377-1401 [doi]
- HALoGEN: Fantastic LLM Hallucinations and Where to Find ThemAbhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi 0001. 1402-1425 [doi]
- Synergizing LLMs with Global Label Propagation for Multimodal Fake News DetectionShuguo Hu, Jun Hu, Huaiwen Zhang. 1426-1440 [doi]
- "Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced DistillationZi Liang, Qingqing Ye 0001, Yanyun Wang 0003, Sen Zhang 0002, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu 0001. 1441-1465 [doi]
- Jailbreak Large Vision-Language Models Through Multi-Modal LinkageYu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He. 1466-1494 [doi]
- Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsGracjan Góral, Emilia Wisnios, Piotr Sankowski, Pawel Budzianowski. 1495-1515 [doi]
- The Hidden Attention of Mamba ModelsAmeen Ali, Itamar Zimerman, Lior Wolf. 1516-1534 [doi]
- KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional EmbeddingLuohe Shi, Zuchao Li, Lefei Zhang, Baoyuan Qi, Liu Guoming, Hai Zhao 0001. 1535-1550 [doi]
- LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language ModelsYan Wang, Ling Ding, Tien N. Nguyen, Shaohua Wang, Yanan Zheng. 1551-1567 [doi]
- MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation DatasetWeiqi Wang 0001, Yangqiu Song. 1568-1596 [doi]
- Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem SolutionsHang Li 0007, Tianlong Xu, Kaiqi Yang 0001, Yucheng Chu, Yanling Chen, Yichi Song, Qingsong Wen, Hui Liu 0031. 1597-1609 [doi]
- Real-time Factuality Assessment from Adversarial FeedbackSanxing Chen, Yukun Huang, Bhuwan Dhingra. 1610-1630 [doi]
- Improve Vision Language Model Chain-of-thought ReasoningRuohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang. 1631-1662 [doi]
- On the Mutual Influence of Gender and Occupation in LLM RepresentationsHaozhe An, Connor Baumler, Abhilasha Sancheti, Rachel Rudinger. 1663-1680 [doi]
- Disentangling Memory and Reasoning Ability in Large Language ModelsMingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang 0003, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang. 1681-1701 [doi]
- Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction TuningJiaqi Li, Yanming Li, Xiaoli Shen, Chuanyi Zhang, Guilin Qi, Sheng Bi. 1702-1714 [doi]
- Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attributions ExplainabilityJoakim Edin, Andreas Geert Motzfeldt, Casper L. Christensen, Tuukka Ruotsalo, Lars Maaløe, Maria Maistro. 1715-1730 [doi]
- Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre ModelingYuguang Yang 0005, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye, Hongbin Zhou, Lei Xie 0001, Lei Ma 0003, Jianjun Zhao 0001. 1731-1742 [doi]
- LangSAMP: Language-Script Aware Multilingual PretrainingYihong Liu 0001, Haotian Ye, Chunlan Ma, Mingyang Wang 0003, Hinrich Schütze. 1743-1770 [doi]
- RelationalCoder: Rethinking Complex Tables via Programmatic Relational TransformationHaoyu Dong 0001, Yue Hu 0002, Huailiang Peng, Yanan Cao 0001. 1771-1784 [doi]
- Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case StudyBolei Ma, Berk Yoztyurk, Anna-Carolina Haensch, Xinpeng Wang 0003, Markus Herklotz, Frauke Kreuter, Barbara Plank, Matthias Aßenmacher. 1785-1809 [doi]
- TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosFanheng Kong, Jingyuan Zhang, Hongzhi Zhang, Shi Feng 0001, Daling Wang, Linhao Yu, Xingguang Ji, Yu Tian, Victoria W., Fuzheng Zhang. 1810-1839 [doi]
- Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMsZhuo Li, Yuhao Du, Jinpeng Hu, Xiang Wan, Anningzhe Gao. 1840-1857 [doi]
- Binary Classifier Optimization for Large Language Model AlignmentSeungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-woon On. 1858-1872 [doi]
- UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' MemorizationMd Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son, Eduardo Blanco 0002, Steven R. Corman, Chitta Baral. 1873-1913 [doi]
- From Information to Insight: Leveraging LLMs for Open Aspect-Based Educational SummarizationYang Zhong, Diane J. Litman. 1914-1947 [doi]
- AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark DatasetCharles Nimo, Tobi Olatunji, Abraham Toluwase Owodunni, Tassallah Abdullahi, Emmanuel Ayodele, Mardhiyah Sanni, Ezinwanne C. Aka, Folafunmi Omofoye, Foutse Yuehgoh, Timothy Faniran, Bonaventure F. P. Dossou, Moshood O. Yekini, Jonas Kemp, Katherine A. Heller, Jude Chidubem Omeke, Chidi Asuzu MD, Naome A. Etori, Aimérou Ndiaye, Ifeoma Okoh, Evans Doe Ocansey, Wendy Kinara, Michael L. Best, Irfan Essa, Stephen Edward Moore, Chris Fourie, Mercy Nyamewaa Asiedu. 1948-1973 [doi]
- Root Defense Strategies: Ensuring Safety of LLM at the Decoding LevelXinyi Zeng, Yuying Shang, Jiawei Chen, Jingyuan Zhang, Yu Tian. 1974-1988 [doi]
- In-the-wild Audio Spatialization with Flexible Text-guided LocalizationTianrui Pan, Jie Liu 0040, Zewen Huang, Jie Tang 0006, Gangshan Wu. 1989-2001 [doi]
- L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language ModelsHyesung Jeon, Yulhwa Kim, Jae-Joon Kim. 2002-2024 [doi]
- Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary ExpansionJianqing Zhu, Huang Huang, Zhihang Lin, Juhao Liang, Zhengyang Tang, Khalid Almubarak, Mosen Alharthi, Bang An 0004, Juncai He, Xiangbo Wu, Fei Yu, Junying Chen, Zhuoheng Ma, Yuhao Du, He Zhang, Saied Alshahrani, Emad A. Alghamdi, Lian Zhang, Ruoyu Sun 0001, Haizhou Li 0001, Benyou Wang, Jinchao Xu. 2025-2042 [doi]
- What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMsSangyeop Kim 0001, Yohan Lee, Yongwoo Song, Kimin Lee. 2043-2063 [doi]
- ECERC: Evidence-Cause Attention Network for Multi-Modal Emotion Recognition in ConversationTao Zhang, Zhenhua Tan. 2064-2077 [doi]
- CompileAgent: Automated Real-World Repo-Level Compilation with Tool-Integrated LLM-based Agent SystemLi Hu, Guoqiang Chen, Xiuwei Shang, Shaoyin Cheng, BenLong Wu, LiGangyang LiGangyang, Xu Zhu, Weiming Zhang 0001, Nenghai Yu. 2078-2091 [doi]
- Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text PerceptionsMatthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, Dirk Hovy. 2092-2111 [doi]
- Exploring Forgetting in Large Language Model Pre-TrainingChonghua Liao, Ruobing Xie, Xingwu Sun, Haowen Sun, Zhanhui Kang. 2112-2127 [doi]
- Bias in the Mirror : Are LLMs opinions robust to their own adversarial attacksVirgile Rennard, Christos Xypolopoulos, Michalis Vazirgiannis. 2128-2143 [doi]
- AndroidLab: Training and Systematic Benchmarking of Android Autonomous AgentsYifan Xu, Xiao Liu 0036, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang 0001, Yuxiao Dong. 2144-2166 [doi]
- Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentYongxin Huang, Kexin Wang, Goran Glavas, Iryna Gurevych. 2167-2187 [doi]
- Multimodal Transformers are Hierarchical Modal-wise Heterogeneous GraphsYijie Jin, Junjie Peng, Xuanchao Lin, Haochen Yuan, Lan Wang, Cangzhi Zheng. 2188-2209 [doi]
- Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and RethinkingYichi Zhang 0009, Zhuo Chen 0007, Lingbing Guo, Yajing Xu, Shaokai Chen, Mengshu Sun, Binbin Hu, Zhiqiang Zhang 0012, Lei Liang 0002, Wen Zhang 0015, Huajun Chen. 2210-2226 [doi]
- LLäMmlein: Transparent, Compact and Competitive German-Only Language Models from ScratchJan Pfister, Julia Wunderle, Andreas Hotho. 2227-2246 [doi]
- Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded DialoguesYoungmin Kim, Jiwan Chung, Jisoo Kim 0006, Sunghyun Lee, Sangkyu Lee, Junhyeok Kim 0002, Cheoljong Yang, Youngjae Yu. 2247-2265 [doi]
- How Much Do Encoder Models Know About Word Senses?Simone Teglia, Simone Tedeschi, Roberto Navigli. 2266-2277 [doi]
- When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated ExplanationsHuaizhi Ge, Yiming Li, Qifan Wang, Yongfeng Zhang, Ruixiang Tang. 2278-2296 [doi]
- HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on TwitterManuel Tonneau, Diyi Liu, Niyati Malhotra, Scott A. Hale, Samuel Fraiberger, Víctor Orozco-Olvera, Paul Röttger. 2297-2321 [doi]
- LegalAgentBench: Evaluating LLM Agents in Legal DomainHaitao Li 0006, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu 0001, Minlie Huang. 2322-2344 [doi]
- Inference Compute-Optimal Video Vision Language ModelsPeiqi Wang, Shengyun Peng, Xuewen Zhang, Hanchao Yu, Yibo Yang, Lifu Huang, Fujun Liu, Qifan Wang. 2345-2374 [doi]
- Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language ModelsAnirudh Sundar, Sinead Williamson, Katherine Metcalf, Barry-John Theobald, Skyler Seto, Masha Fedzechkina. 2375-2401 [doi]
- Digital Gatekeepers: Google's Role in Curating Hashtags and SubredditsAmrit Poudel, Yifan Ding, Tim Weninger, Jürgen Pfeffer. 2402-2415 [doi]
- Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic DiscourseAnna Kolos, Katarzyna Lorenc, Emilia Wisnios, Agnieszka Karlinska. 2416-2432 [doi]
- Assessment and manipulation of latent constructs in pre-trained language models using psychometric scalesMaor Reuben, Ortal Slobodin, Idan-Chaim Cohen, Aviad Elyashar, Orna Braun-Lewensohn, Odeya Cohen, Rami Puzis. 2433-2444 [doi]
- Did Translation Models Get More Robust Without Anyone Even Noticing?Ben Peters, André F. T. Martins. 2445-2458 [doi]
- Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining DatasetDan Su 0003, Kezhi Kong, Ying Lin, Joseph Jennings, Brandon Norick, Markus Kliegl, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro. 2459-2475 [doi]
- Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka EmbeddingsHans William Alexander Hanley, Zakir Durumeric. 2476-2492 [doi]
- Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language ModelsTassilo Klein, Moin Nabi. 2493-2508 [doi]
- INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based AgentHaohang Li, Yupeng Cao, Yangyang Yu, Shashidhar Reddy Javaji, Zhiyang Deng, Yueru He, Yuechen Jiang, Zining Zhu 0005, K. P. Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W. Suchow, Qianqian Xie. 2509-2525 [doi]
- Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and InferenceBenjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Griffin Thomas Adams, Jeremy Howard, Iacopo Poli. 2526-2547 [doi]
- Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language ModelsZhengyang Shan, Emily Diana, Jiawei Zhou. 2548-2579 [doi]
- D.Va: Validate Your Demonstration First Before You Use ItQi Zhang, Zhiqing Xiao, Ruixuan Xiao, Lirong Gao, Junbo Zhao 0002. 2580-2594 [doi]
- Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu. 2595-2606 [doi]
- MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented GenerationChia-Yuan Chang 0002, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng 0001, Mahashweta Das, Na Zou 0001. 2607-2622 [doi]
- Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context LearningHui Liu 0036, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li. 2623-2641 [doi]
- Direct Prompt Optimization with Continuous RepresentationsYangkun Wang, Zihan Wang 0001, Jingbo Shang. 2642-2652 [doi]
- uMedSum: A Unified Framework for Clinical Abstractive SummarizationAishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan. 2653-2672 [doi]
- GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and RefinementYifan Yang 0005, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang 0006, Yexing Du, Ziyang Ma 0001, Xunying Liu, Ziyuan Wang, Ke Li 0018, Shuai Fan 0005, Kai Yu 0004, Wei-Qiang Zhang 0001, Guoguo Chen, Xie Chen 0001. 2673-2686 [doi]
- Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing AgentsFanhang Man, Huandong Wang, Jianjie Fang, Zhaoyi Deng, Baining Zhao, Xinlei Chen, Yong Li. 2687-2703 [doi]
- TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured DataXiang Huang 0007, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang 0001, Yuzhong Qu. 2704-2726 [doi]
- AndroidGen: Building an Android Language Agent under Data ScarcityHanyu Lai, Junjie Gao, Xiao Liu 0036, Yifan Xu, Shudan Zhang, Yuxiao Dong, Jie Tang 0001. 2727-2749 [doi]
- Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data AnnotationMingxuan Xia, Haobo Wang, Yixuan Li, Zewei Yu, Jindong Wang, Junbo Zhao, Runze Wu. 2750-2770 [doi]
- A Survey of Post-Training Scaling in Large Language ModelsHanyu Lai, Xiao Liu 0036, Junjie Gao, Jiale Cheng, Zehan Qi, Yifan Xu, Shuntian Yao, Dan Zhang, Jinhua Du, Zhenyu Hou, Xin Lv, Minlie Huang, Yuxiao Dong, Jie Tang 0001. 2771-2791 [doi]
- Position-aware Automatic Circuit DiscoveryTal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov. 2792-2817 [doi]
- HyperFM: Fact-Centric Multimodal Fusion for Link Prediction over Hyper-Relational Knowledge GraphsYuhuan Lu, Weijian Yu, Xin Jing, Dingqi Yang. 2818-2830 [doi]
- Centurio: On Drivers of Multilingual Ability of Large Vision-Language ModelGregor Geigle, Florian Schneider 0001, Carolin Holtermann, Chris Biemann, Radu Timofte, Anne Lauscher, Goran Glavas. 2831-2881 [doi]
- Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI EvaluationDimitris Gkoumas, Maria Liakata. 2882-2902 [doi]
- Ensemble Watermarks for Large Language ModelsGeorg Niess, Roman Kern. 2903-2916 [doi]
- \mathsfCon Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual ModalitiesJiahui Geng, Thy Thy Tran, Preslav Nakov, Iryna Gurevych. 2917-2933 [doi]
- TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-JudgeCheng-Han Chiang, Hung-yi Lee, Michal Lukasik. 2934-2952 [doi]
- DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented GenerationHanghui Guo, Jia Zhu 0003, Shimin Di, Weijie Shi, Zhangze Chen, Jiajie Xu 0001. 2953-2975 [doi]
- Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine TranslationBoxuan Lyu, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura. 2976-2994 [doi]
- ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool UseJunjie Ye, Zhengyin Du, Xuesong Yao, Weijian Lin, Yufei Xu, Zehui Chen, Zaiyuan Wang, Sining Zhu, Zhiheng Xi, Siyu Yuan, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001, Jiecao Chen. 2995-3021 [doi]
- Mixture of insighTful Experts (MoTE): The Synergy of Reasoning Chains and Expert Mixtures in Self-AlignmentZhili Liu, Yunhao Gou, Kai Chen 0023, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang 0006, Zhenguo Li, Xin Jiang 0002, Qun Liu 0001, James T. Kwok. 3022-3038 [doi]
- MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation AlignmentWeicong Qin, Yi Xu 0003, Weijie Yu 0003, Chenglei Shen, Ming He, Jianping Fan 0001, Xiao Zhang 0034, Jun Xu 0001. 3039-3051 [doi]
- Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve FrameworkJundong Xu, Hao Fei 0001, Meng Luo 0002, Qian Liu 0012, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, Wynne Hsu. 3052-3075 [doi]
- LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMsJianghao Chen, Junhong Wu, Yangyifan Xu, Jiajun Zhang 0001. 3076-3090 [doi]
- Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial TrainingYuanfan Li, Zhaohan Zhang, Chengzhengxu Li, Chao Shen, Xiaoming Liu. 3091-3113 [doi]
- Cultural Learning-Based Culture Adaptation of Language ModelsChen Cecilia Liu, Anna Korhonen, Iryna Gurevych. 3114-3134 [doi]
- A-TASC: Asian TED-Based Automatic Subtitling CorpusYuhan Zhou, Naoki Yoshinaga 0001. 3135-3148 [doi]
- Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal TrainingYouliang Yuan, Wenxiang Jiao, Wenxuan Wang 0001, Jen-tse Huang 0001, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu. 3149-3167 [doi]
- Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMsYuchen Fu, Zifeng Cheng, Zhiwei Jiang, Zhonghui Wang, Yafeng Yin 0002, Zhengliang Li, Qing Gu 0001. 3168-3181 [doi]
- No Questions are Stupid, but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking QuestionsNeha Srikanth, Rachel Rudinger, Jordan Lee Boyd-Graber. 3182-3199 [doi]
- Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat LogsRupak Sarkar, Neha Srikanth, Taylor Pellegrin, Rachel Rudinger, Claire Bonial, Philip Resnik. 3200-3215 [doi]
- Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language ModelsOlga Loginova, Oleksandr Bezrukov, Ravi Shekhar, Alexey Kravets. 3216-3246 [doi]
- Towards Reward Fairness in RLHF: From a Resource Allocation PerspectiveSheng Ouyang, Yulan Hu, Ge Chen 0006, Qingyang Li, Fuzheng Zhang, Yong Liu 0018. 3247-3259 [doi]
- Taming LLMs with Gradient GroupingSiyuan Li 0002, Juanxi Tian, Zedong Wang, Xin Jin, Zicheng Liu 0006, Wentao Zhang, Dan Xu. 3260-3279 [doi]
- LazyReview: A Dataset for Uncovering Lazy Thinking in NLP Peer ReviewsSukannya Purkayastha, Zhuang Li 0001, Anne Lauscher, Lizhen Qu, Iryna Gurevych. 3280-3308 [doi]
- Revisiting Common Assumptions about Arabic Dialects in NLPAmr Keleg, Sharon Goldwater, Walid Magdy. 3309-3327 [doi]
- Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target IdentificationRavi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane S. Corneil. 3328-3370 [doi]
- Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User PersonasNishant Balepur, Vishakh Padmakumar, Fumeng Yang, Shi Feng 0005, Rachel Rudinger, Jordan Lee Boyd-Graber. 3371-3393 [doi]
- Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the AboveNishant Balepur, Rachel Rudinger, Jordan Lee Boyd-Graber. 3394-3418 [doi]
- Detection of Human and Machine-Authored Fake News in UrduMuhammad Zain Ali, Yuxia Wang, Bernhard Pfahringer, Tony C. Smith. 3419-3428 [doi]
- An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite IndividualsYangyang Zhao, Ben Niu, Libo Qin 0001, Shihan Wang 0001. 3429-3442 [doi]
- SR-LLM: Rethinking the Structured Representation in Large Language ModelJiahuan Zhang, Tianheng Wang, Ziyi Huang, Yulong Wu, Hanqing Wu, DongbaiChen DongbaiChen, Linfeng Song, Yue Zhang 0031, Guozheng Rao, Kaicheng Yu. 3443-3462 [doi]
- Taming Language Models for Text-attributed Graph Learning with Decoupled AggregationChuang Zhou 0002, Zhu Wang, Shengyuan Chen, Jiahe Du, Qiyuan Zheng, Zhaozhuo Xu, Xiao Huang 0001. 3463-3474 [doi]
- Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time SteeringZifeng Cheng, Zhonghui Wang, Yuchen Fu, Zhiwei Jiang, Yafeng Yin 0002, Cong Wang 0034, Qing Gu 0001. 3475-3487 [doi]
- Cracking the Code of Hallucination in LVLMs with Vision-aware Head DivergenceJinghan He, Kuan Zhu, Haiyun Guo, Junfeng Fang, Zhenglin Hua, Yuheng Jia, Ming Tang 0001, Tat-Seng Chua, Jinqiao Wang. 3488-3501 [doi]
- Hierarchical Document Refinement for Long-context Retrieval-augmented GenerationJiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu 0001, Yongkang Wu, Zhonghua Li, Ye Qi, Zhicheng Dou. 3502-3520 [doi]
- Comparing Moral Values in Western English-speaking societies and LLMs with Word AssociationsChaoyi Xiang, Chunhua Liu, Simon De Deyne, Lea Frermann. 3521-3536 [doi]
- TEACH: A Contrastive Knowledge Adaptive Distillation Framework for Classical Chinese UnderstandingYuting Wei, Qi Meng, Yuanxing Xu, Bin Wu. 3537-3550 [doi]
- RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented GenerationGuanting Dong, Jiajie Jin, Xiaoxi Li, Yutao Zhu 0001, Zhicheng Dou, Ji-Rong Wen. 3551-3578 [doi]
- Progressive Multimodal Reasoning via Active RetrievalGuanting Dong, Chenghao Zhang 0001, Mengjie Deng, Yutao Zhu 0001, Zhicheng Dou, Ji-Rong Wen. 3579-3602 [doi]
- Pre-training Distillation for Large Language Models: A Design Space ExplorationHao Peng 0015, Xin Lv, Yushi Bai, Zijun Yao 0002, Jiajie Zhang, Lei Hou 0001, Juanzi Li. 3603-3618 [doi]
- Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual QuestionsPu Jian, Donglei Yu, Wen Yang, Shuo Ren, Jiajun Zhang. 3619-3638 [doi]
- LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context MultitasksYushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng 0015, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou 0001, Yuxiao Dong, Jie Tang 0001, Juanzi Li. 3639-3664 [doi]
- Battling against Tough Resister: Strategy Planning with Adversarial Game for Non-collaborative DialoguesHaiyang Wang, Zhiliang Tian, Yuchen Pan, Xin Song, Xin Niu 0002, Minlie Huang, Bin Zhou 0004. 3665-3685 [doi]
- Cross-model Transferability among Large Language Models on the Platonic Representations of ConceptsYoucheng Huang, Chen Huang 0006, Duanyu Feng, Wenqiang Lei, Jiancheng Lv 0001. 3686-3704 [doi]
- FoldMoE: Efficient Long Sequence MoE Training via Attention-MoE PipeliningGuichao Zhu, Lintian Lei, Yuhao Qing, Yichao Fu, Fanxin Li, Dong Huang 0005, Zekai Sun, Heming Cui. 3705-3717 [doi]
- LongReward: Improving Long-context Large Language Models with AI FeedbackJiajie Zhang, Zhongni Hou, Xin Lv, Shulin Cao, Zhenyu Hou, Yilin Niu, Lei Hou 0001, Yuxiao Dong, Ling Feng, Juanzi Li. 3718-3739 [doi]
- Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt StylesYuxi Xia, Pedro Henrique Luz de Araujo, Klim Zaporojets, Benjamin Roth 0001. 3740-3761 [doi]
- UTBoost: Rigorous Evaluation of Coding Agents on SWE-BenchBoxi Yu, Yuxuan Zhu, Pinjia He, Daniel Kang. 3762-3774 [doi]
- Towards Better Evaluation for Generated Patent ClaimsLekang Jiang, Pascal A Scherz, Stefan Goetz. 3775-3788 [doi]
- Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMsHaritz Puerto, Tilek Chubakov, Xiaodan Zhu, Harish Tayyar Madabushi, Iryna Gurevych. 3789-3808 [doi]
- Establishing Trustworthy LLM Evaluation via Shortcut Neuron AnalysisKejian Zhu, Shangqing Tu, Zhuoran Jin, Lei Hou 0001, Juanzi Li, Jun Zhao 0001. 3809-3822 [doi]
- Do Large Language Models have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMsYanzhu Guo, Simone Conia, Zelin Zhou, Min Li, Saloni Potdar, Henry Xiao. 3823-3838 [doi]
- Enhancing Character-Level Understanding in LLMs through Token Internal Structure LearningZhu Xu, Zhiqiang Zhao, Zihan Zhang, Yuchi Liu, Quanwei Shen, Fei Liu, Yu Kuang, Jian He, Conglin Liu. 3839-3853 [doi]
- Conformity in Large Language ModelsXiaochen Zhu, Caiqi Zhang, Tom Stafford 0002, Nigel Collier, Andreas Vlachos 0001. 3854-3872 [doi]
- Interpret and Improve In-Context Learning via the Lens of Input-Label MappingsChenghao Sun, Zhen Huang 0007, Yonggang Zhang 0003, Le Lu 0001, Houqiang Li, Xinmei Tian 0001, Xu Shen 0001, Jieping Ye. 3873-3895 [doi]
- Positional Overload: Positional Debiasing and Context Window Extension for Large Language Models using Set EncodingLukas Kinder, Lukas Edman, Alexander Fraser 0001, Tobias Käfer. 3896-3908 [doi]
- FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative SamplingWeilin Zhao, Tengyu Pan, Xu Han 0007, Yudi Zhang, Sun Ao, Yuxiang Huang, Kaihuo Zhang, Weilun Zhao, Yuxuan Li, Jie Zhou 0016, Hao Zhou 0012, Jianyong Wang 0001, Maosong Sun 0001, Zhiyuan Liu 0001. 3909-3921 [doi]
- VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward MechanismCongzhi Zhang, Jiawei Peng, Zhenglin Wang, Yilong Lai, Haowen Sun, Heng Chang, Fei Ma, Weijiang Yu. 3922-3941 [doi]
- Past Meets Present: Creating Historical Analogy with Large Language ModelsNianqi Li, Siyu Yuan, Jiangjie Chen, Jiaqing Liang, Feng Wei, Zujie Liang, Deqing Yang, Yanghua Xiao. 3942-3957 [doi]
- Meta-Reflection: A Feedback-Free Reflection Learning FrameworkYaoke Wang, Yun Zhu, XintongBao XintongBao, Wenqiao Zhang, Suyang Dai, Kehan Chen, Wenqiang Li, Gang Huang, Siliang Tang, Yueting Zhuang. 3958-3976 [doi]
- Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar BooksChen Zhang, Jiuheng Lin, Xiao Liu, Zekai Zhang, Yansong Feng. 3977-3997 [doi]
- Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMsZhe Yang 0013, Yichang Zhang, Yudong Wang 0005, Ziyao Xu 0001, Junyang Lin, Zhifang Sui. 3998-4014 [doi]
- Automating Legal Interpretation with LLMs: Retrieval, Generation, and EvaluationKangcheng Luo, Quzhe Huang, Cong Jiang, Yansong Feng 0002. 4015-4047 [doi]
- Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language ModelsWei Li 0190, Zhen Huang 0007, Houqiang Li, Le Lu 0001, Yang Lu, Xinmei Tian 0001, Xu Shen 0001, Jieping Ye. 4048-4080 [doi]
- Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI CollaborationShao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang 0001, Xinbing Wang, Ying Wen 0001. 4081-4108 [doi]
- TokAlign: Efficient Vocabulary Adaptation via Token AlignmentChong Li, Jiajun Zhang, Chengqing Zong. 4109-4126 [doi]
- AdaEdit: Advancing Continuous Knowledge Editing For Large Language ModelsQi Li, Xiaowen Chu 0001. 4127-4149 [doi]
- The Impact of Token Granularity on the Predictive Power of Language Model SurprisalByung-Doh Oh, William Schuler. 4150-4162 [doi]
- Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language ModelsXiaochen Zhu, Georgi Karadzhov, Chenxi Whitehouse, Andreas Vlachos 0001. 4163-4183 [doi]
- BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question AnsweringTaolin Zhang 0001, Dongyang Li, Qizhou Chen, Chengyu Wang 0001, Xiaofeng He. 4184-4202 [doi]
- Dynamic and Generalizable Process Reward ModelingZhangyue Yin, Qiushi Sun, Zhiyuan Zeng 0004, Qinyuan Cheng, Xipeng Qiu, Xuanjing Huang 0001. 4203-4233 [doi]
- AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on HarmfulnessZixin Chen, Hongzhan Lin 0001, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen 0003, Zhiyong Huang, Jing Ma 0004. 4234-4253 [doi]
- Towards Text-Image Interleaved RetrievalXin Zhang 0097, Ziqi Dai, Yongqi Li 0001, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Jun Yu, Wenjie Li 0002, Min Zhang 0005. 4254-4269 [doi]
- Large Margin Representation Learning for Robust Cross-lingual Named Entity RecognitionGuangcheng Zhu, Ruixuan Xiao, Haobo Wang, Zhen Zhu, Gengyu Lyu, Junbo Zhao. 4270-4291 [doi]
- An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical ReasoningWei Sun, Qianlong Du, Fuwei Cui, Jiajun Zhang. 4292-4305 [doi]
- QAEncoder: Towards Aligned Representation Learning in Question Answering SystemsZhengren Wang, Qinhan Yu, Shida Wei, Zhiyu Li, Feiyu Xiong, Xiaoxing Wang, Simin Niu, Hao Liang, Wentao Zhang. 4306-4332 [doi]
- Game Development as Human-LLM InteractionJiale Hong, Hongqiu Wu, Hai Zhao 0001. 4333-4354 [doi]
- Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent BiasesRena Wei Gao, Xuetong Wu, Tatsuki Kuribayashi, Mingrui Ye, Siya Qi, Carsten Roever, Yuanxing Liu 0001, Zheng Yuan, Jey Han Lau. 4355-4379 [doi]
- DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point ThinkingZhuoqun Li, Haiyang Yu 0003, Xuanang Chen, Hongyu Lin, Yaojie Lu 0001, Fei Huang 0002, Xianpei Han, Yongbin Li, Le Sun 0001. 4380-4396 [doi]
- SurveyPilot: an Agentic Framework for Automated Human Opinion Collection from Social MediaViet-Thanh Pham, Lizhen Qu, Zhuang Li 0001, Suraj Sharma, Gholamreza Haffari. 4397-4422 [doi]
- Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video UnderstandingDaoze Zhang, Yuze Zhao, Jintao Huang, Yingda Chen. 4423-4439 [doi]
- Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee DiscussionsRuochen Zhao, Wenxuan Zhang 0001, Yew Ken Chia, Weiwen Xu, Deli Zhao, Lidong Bing. 4440-4463 [doi]
- How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in ItalianAndrea Pedrotti, Giulia Rambelli, Caterina Villani, Marianna Bolognesi. 4464-4482 [doi]
- PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language ModelsJiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang 0001, Min Zhang 0005. 4483-4502 [doi]
- ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text ClassificationBowen Wei, Ziwei Zhu. 4503-4523 [doi]
- Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference OptimizationChaoqun Cui, Liangbin Huang, Shijing Wang, Zhe Tong, Zhaolong Huang, Xiao Zeng, Xiaofeng Liu. 4524-4546 [doi]
- Sparse Latents Steer Retrieval-Augmented GenerationChunlei Xin, Shuheng Zhou 0001, Huijia Zhu, Weiqiang Wang, Xuanang Chen, Xinyan Guan, Yaojie Lu 0001, Hongyu Lin, Xianpei Han, Le Sun 0001. 4547-4562 [doi]
- Unveiling Language-Specific Features in Large Language Models via Sparse AutoencodersBoyi Deng, Yu Wan 0004, Baosong Yang, Yidan Zhang, Fuli Feng. 4563-4608 [doi]
- SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language ModelXun Liang 0001, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Jihao Zhao, Jiawei Yang, Shichao Song, Mengwei Wang. 4609-4631 [doi]
- AnRe: Analogical Replay for Temporal Knowledge Graph ForecastingGuo Tang, Zheng Chu, Wenxiang Zheng, Junjia Xiang, Yizhuo Li 0007, Weihao Zhang, Ming Liu 0004, Bing Qin 0001. 4632-4650 [doi]
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?Zhiyuan Zeng 0004, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu. 4651-4665 [doi]
- Text is All You Need: LLM-enhanced Incremental Social Event DetectionZitai Qiu, Congbo Ma, Jia Wu 0001, Jian Yang 0001. 4666-4680 [doi]
- Multimodal Pragmatic Jailbreak on Text-to-image ModelsTong Liu 0019, Zhixin Lai, Jiawen Wang, Gengyuan Zhang, Shuo Chen 0014, Philip Torr 0001, Vera Demberg, Volker Tresp, Jindong Gu. 4681-4720 [doi]
- Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning TasksXingcheng Xu, Zibo Zhao, Haipeng Zhang, Yanqing Yang. 4721-4747 [doi]
- Discourse Relation-Enhanced Neural Coherence ModelingWei Liu 0145, Michael Strube 0001. 4748-4762 [doi]
- Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language ModelsKuofeng Gao, Shutao Xia, Ke Xu, Philip Torr 0001, Jindong Gu. 4763-4784 [doi]
- from Benign import Toxic: Jailbreaking the Language Model via Adversarial MetaphorsYu Yan, Sheng Sun, Zenghao Duan, Teli Liu, Min Liu, Zhiyi Yin, LeiJingyu LeiJingyu, Qi Li. 4785-4817 [doi]
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive FrameworkHengyuan Zhang, Chenming Shang, Sizhe Wang, DongDong Zhang, Yiyao Yu, Feng Yao, Renliang Sun, Yujiu Yang, Furu Wei. 4818-4841 [doi]
- MorphMark: Flexible Adaptive Watermarking for Large Language ModelsZongqi Wang, Tianle Gu, Baoyuan Wu, Yujiu Yang. 4842-4860 [doi]
- A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context CompressionChenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu 0001, Zhicheng Dou. 4861-4879 [doi]
- On the Limit of Language Models as Planning FormalizersCassie Huang, Li Zhang. 4880-4904 [doi]
- Learning to Generate Structured Output with Schema Reinforcement LearningYaxi Lu, Haolun Li 0003, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Zhiyuan Liu 0001, Fangming Liu, Maosong Sun 0001. 4905-4918 [doi]
- Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive LearningPeichao Lai, Zhengfeng Zhang, Wentao Zhang 0001, Fangcheng Fu, Bin Cui 0001. 4919-4940 [doi]
- Improve Safety Training of Large Language Models with Safety-Critical Singular Vectors LocalizationPeijian Gu, Quan Wang 0002, Zhendong Mao 0001. 4941-4954 [doi]
- WarriorCoder: Learning from Expert Battles to Augment Code Large Language ModelsHuawen Feng, Pu Zhao 0004, Qingfeng Sun, Can Xu, Fangkai Yang, Lu Wang 0029, Qianli Ma 0001, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001, Qi Zhang 0066. 4955-4969 [doi]
- A Triple-View Framework for Fine-Grained Emotion Classification with Clustering-Guided Contrastive LearningJunqing Gong 0003, Binhan Yang, Wei Shen 0004. 4970-4984 [doi]
- Quantification of Large Language Model DistillationSunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xeron Du, Sirui He, Haihong Wu, Tianci Liu 0011, Jiaheng Liu, Hamid Alinejad-Rokny, Min Yang 0007, Yitao Liang, Zhoufutu Wen, Shiwen Ni. 4985-5004 [doi]
- Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert ModelsZihan Qiu, Zeyu Huang, Bo Zheng 0007, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou 0001, Junyang Lin. 5005-5018 [doi]
- Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language ModelsJinyang Wu, Shuai Zhang 0014, Feihu Che, Mingkuan Feng, Pengpeng Shao, Jianhua Tao 0001. 5019-5039 [doi]
- Stepwise Reasoning Disruption Attack of LLMsJingyu Peng, Maolin Wang 0001, Xiangyu Zhao 0001, Kai Zhang 0038, Wanyu Wang, Pengyue Jia, Qidong Liu 0002, Ruocheng Guo, Qi Liu 0003. 5040-5058 [doi]
- Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-JudgeQiyuan Zhang, Yufei Wang 0005, Yuxin Jiang, Liangyou Li, Chuhan Wu, Yasheng Wang, Xin Jiang 0002, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma 0001. 5059-5074 [doi]
- Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language ModelsMingyang Wang 0003, Heike Adel, Lukas Lange, Yihong Liu 0001, Ercong Nie, Jannik Strötgen, Hinrich Schütze. 5075-5094 [doi]
- Optimizing Decomposition for Optimal Claim VerificationYining Lu, Noah Ziems, Hy Dang, Meng Jiang 0001. 5095-5114 [doi]
- GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language ModelsKai Yao, Zhaorui Tan, Penglei Gao, Lichun Li, Kaixin Wu, Yinggui Wang, Yuan Zhao, Yixin Ji, Jianke Zhu, Wei Wang 0002. 5115-5130 [doi]
- Knowledge Boundary of Large Language Models: A SurveyMoxin Li, Yong Zhao, Wenxuan Zhang 0001, Shuaiyi Li, Wenya Xie, See-Kiong Ng, Tat-Seng Chua, Yang Deng 0002. 5131-5157 [doi]
- Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT ReasoningHai-Long Sun, Zhun Sun, Houwen Peng, Han-Jia Ye. 5158-5171 [doi]
- MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation SystemJihao Zhao, Zhiyuan Ji, Zhaoxin Fan, Hanyu Wang, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li. 5172-5189 [doi]
- Mitigating Selection Bias with Node Pruning and Auxiliary OptionsHyeong Kyu Choi, Weijie Xu, Chi Xue, Stephanie Eckman, Chandan K. Reddy. 5190-5215 [doi]
- Dually Self-Improved Counterfactual Data Augmentation Using Large Language ModelLuhao Zhang, Xinyu Zhang, Linmei Hu, Dandan Song, Liqiang Nie. 5216-5227 [doi]
- RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented GenerationShi-Qi Yan, Quan Liu 0003, Zhen-Hua Ling. 5228-5240 [doi]
- Learning to Reason from Feedback at Test-TimeYanyang Li, Michael R. Lyu, Liwei Wang 0009. 5241-5253 [doi]
- L-CiteEval: A Suite for Evaluating Fidelity of Long-context ModelsZecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang. 5254-5277 [doi]
- SECRET: Semi-supervised Clinical Trial Document Similarity SearchTrisha Das, Afrah Shafquat, Mandis Beigi, Jacob Aptekar, Jimeng Sun 0001. 5278-5291 [doi]
- Geometric Signatures of Compositionality Across a Language Model's LifetimeJin Hwa Lee, Thomas Jiralerspong, Lei Yu, Yoshua Bengio, Emily Cheng. 5292-5320 [doi]
- Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in MedicineMaxime Griot, Jean Vanderdonckt, Demet Yüksel, Coralie Hemptinne. 5321-5341 [doi]
- People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated textJenna Russell, Marzena Karpinska, Mohit Iyyer. 5342-5373 [doi]
- YuLan-Mini: Pushing the Limits of Open Data-efficient Language ModelYiwen Hu, Huatong Song, Jie Chen 0007, Jia Deng, Jiapeng Wang, Kun Zhou 0002, Yutao Zhu 0001, Jinhao Jiang, Zican Dong, Yang Lu, Xu Miao, Xin Zhao 0018, Ji-Rong Wen. 5374-5400 [doi]
- Your Model is Overconfident, and Other Lies We Tell OurselvesTimothee Mickus, Aman Sinha 0002, Raúl Vázquez. 5401-5417 [doi]
- Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual InterventionWeixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch. 5418-5433 [doi]
- Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language ModelsKyeonghyun Kim, Jinhee Jang, Juhwan Choi, Yoonji Lee, Kyohoon Jin, Youngbin Kim. 5434-5452 [doi]
- What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health StigmaHan Meng, Yancan Chen, Yunan Li, Yitian Yang, Jungup Lee, Renwen Zhang, Yi-Chieh Lee. 5453-5490 [doi]
- ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution ErrorsYuguo Yin, Yuxin Xie 0004, Wenyuan Yang, Dongchao Yang, Jinghan Ru, Xianwei Zhuang, Liming Liang, Yuexian Zou. 5491-5504 [doi]
- Enhancing Transformers for Generalizable First-Order Logical EntailmentTianshi Zheng, Jiazheng Wang, Zihao Wang 0001, Jiaxin Bai, Hang Yin 0008, Zheye Deng, Yangqiu Song, Jianxin Li 0002. 5505-5524 [doi]
- Self-Taught Agentic Long Context UnderstandingYufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang 0008, Jiang Liu 0014, Yusheng Su, Jingbo Shang, Zicheng Liu 0001, Emad Barsoum. 5525-5537 [doi]
- Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model TrainingShahrad Mohammadzadeh, Juan David Guerra, Marco Bonizzato, Reihaneh Rabbany, Golnoosh Farnadi. 5538-5554 [doi]
- OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task SynthesisQiushi Sun, Kanzhi Cheng, Zichen Ding 0002, Chuanyang Jin, Yian Wang 0003, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li 0001, Junxian He, Yu Qiao 0001, Zhiyong Wu 0003. 5555-5579 [doi]
- CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative DrafterYepeng Weng, Dianwen Mei, Huishi Qiu, Xujie Chen, Li Liu, Jiang Tian, Zhongchao Shi. 5580-5593 [doi]
- ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated SimulatabilityAntonin Poché, Alon Jacovi, Agustin Martin Picard, Victor Boutin, Fanny Jourdan. 5594-5615 [doi]
- Decoding Reading Goals from Eye MovementsOmer Shubi, Cfir Avraham Hadar, Yevgeni Berzak. 5616-5637 [doi]
- Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding SpaceSi Wu, Sebastian Bruch 0001. 5638-5649 [doi]
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentBin Xie, Rui Shao 0001, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu 0001, Min Zhang 0005, Liqiang Nie. 5650-5667 [doi]
- P² Law: Scaling Law for Post-Training After Model PruningXiaodong Chen, Yuxuan Hu, Xiaokang Zhang, Yanling Wang, Cuiping Li, Hong Chen, Jing Zhang. 5668-5686 [doi]
- Making FETCH! Happen: Finding Emergent Dog Whistles Through Common HabitatsKuleen Sasse, Carlos Alejandro Aguirre, Isabel Cachola, Sharon Levy, Mark Dredze. 5687-5709 [doi]
- Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference ModelingShihan Dou, Jiayi Chen, Chenhao Huang, Feng Chen 0042, Wei Chengzhi, Huiyuan Zheng, Shichun Liu, Yan Liu 0002, Chenxiao Liu, Chao Xin, Lin Yan, Zongzhang Zhang, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 5710-5728 [doi]
- Entailment-Preserving First-order Logic Representations in Natural Language EntailmentJinu Lee, Qi Liu, Runzhi Ma, Vincent Han, Ziqi Wang 0003, Heng Ji 0001, Julia Hockenmaier. 5729-5742 [doi]
- Enhancing Multimodal Continual Instruction Tuning with BranchLoRADuzhen Zhang, Yong Ren, Zhong-Zhi Li, Yahan Yu, Jiahua Dong 0001, Chenxing Li, Zhilong Ji, Jinfeng Bai. 5743-5756 [doi]
- Enhancing Automated Interpretability with Output-Centric Feature DescriptionsYoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, Mor Geva. 5757-5778 [doi]
- Towards Effective and Efficient Continual Pre-training of Large Language ModelsJie Chen 0007, Zhipeng Chen 0001, Jiapeng Wang, Kun Zhou 0002, Yutao Zhu 0001, Jinhao Jiang, Yingqian Min, Xin Zhao 0018, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu 0001, Xu Chen 0017, Rui Yan 0001, Zhewei Wei, Di Hu 0001, Wenbing Huang 0001, Ji-Rong Wen. 5779-5795 [doi]
- Efficient Universal Goal Hijacking with Semantics-guided Prompt OrganizationYihao Huang 0001, Chong Wang 0013, Xiaojun Jia, Qing Guo 0005, Felix Juefei-Xu, Jian Zhang 0001, Yang Liu 0003, Geguang Pu. 5796-5816 [doi]
- mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingAnwen Hu, Haiyang Xu 0001, Liang Zhang, Jiabo Ye, Ming Yan 0008, Ji Zhang 0011, Qin Jin, Fei Huang 0002, Jingren Zhou 0001. 5817-5834 [doi]
- What Makes a Good Natural Language Prompt?Do Xuan Long, Duy Dinh, Ngoc Hai Nguyen, Kenji Kawaguchi, Nancy F. Chen, Shafiq Joty, Min-Yen Kan. 5835-5873 [doi]
- X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue AgentsWeiqi Wu, Hongqiu Wu, Hai Zhao 0001. 5874-5889 [doi]
- Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoralShivani Kumar, David Jurgens. 5890-5912 [doi]
- Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language ModelsZheyuan Liu 0010, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, Meng Jiang 0001. 5913-5933 [doi]
- NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional ReasoningZheyuan Zhang, Yiyang Li, Nhi Ha Lan Le, Zehong Wang, Tianyi Ma, Vincent Galassi, Keerthiram Murugesan, Nuno Moniz, Werner Geyer, Nitesh V. Chawla, Chuxu Zhang, Yanfang Ye 0001. 5934-5966 [doi]
- ReLearn: Unlearning via Learning for Large Language ModelsHaoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang 0001. 5967-5987 [doi]
- Understanding Cross-Domain Adaptation in Low-Resource Topic ModelingPritom Saha Akash, Kevin Chen-Chuan Chang. 5988-6001 [doi]
- UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language ModelsBoyang Xue, Fei Mi, Qi Zhu 0007, Hongru Wang 0003, Rui Wang 0092, Sheng Wang, Erxin Yu, Xuming Hu, Kam-Fai Wong. 6002-6024 [doi]
- CoT-Valve: Length-Compressible Chain-of-Thought TuningXinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang. 6025-6035 [doi]
- HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented GenerationJie Ouyang, Tingyue Pan, Mingyue Cheng, Ruiran Yan, Yucong Luo, Jiaying Lin, Qi Liu. 6036-6063 [doi]
- Uncertainty Propagation on LLM AgentQiwei Zhao, Dong Li, Yanchi Liu, Wei Cheng 0002, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Huaxiu Yao, Chen Zhao, Haifeng Chen, Xujiang Zhao. 6064-6073 [doi]
- Beyond Position: the emergence of wavelet-like properties in TransformersValeria Ruscio, Umberto Nanni, Fabrizio Silvestri. 6074-6088 [doi]
- Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMsGiovanni Servedio, Alessandro De Bellis, Dario Di Palma, Vito Walter Anelli, Tommaso Di Noia. 6089-6104 [doi]
- Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine UnlearningZheyuan Liu, Suraj Maharjan, Fanyou Wu, Rahil Parikh, Belhassen Bayar, Srinivasan H. Sengamedu, Meng Jiang. 6105-6123 [doi]
- LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through ProbingDario Di Palma, Alessandro De Bellis, Giovanni Servedio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia. 6124-6142 [doi]
- CxGGEC: Construction-Guided Grammatical Error CorrectionYayu Cao, Tianxiang Wang, Lvxiaowei Xu, Zhenyao Wang, Ming Cai. 6143-6156 [doi]
- Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code GenerationXiangyu Zhang 0005, Yu Zhou 0010, Guang Yang 0019, Wei Cheng, Taolue Chen 0001. 6157-6172 [doi]
- HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMsQing Li 0038, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray. 6173-6186 [doi]
- What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific PresentationsDongqi Liu 0001, Chenxi Whitehouse, Xi Yu, Louis Mahon, Rohit Saxena, Zheng Zhao, Yifu Qiu, Mirella Lapata, Vera Demberg. 6187-6210 [doi]
- NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringRuisheng Cao, Hanchong Zhang, Tiancheng Huang, Zhangyi Kang, Yuxin Zhang, Liangtai Sun, Hanqi Li, Yuxun Miao, Shuai Fan 0005, Lu Chen 0002, Kai Yu 0004. 6211-6239 [doi]
- ProvBench: A Benchmark of Legal Provision Recommendation for Contract Auto-ReviewingXiuxuan Shen, Zhongyuan Jiang, Junsan Zhang, Junxiao Han, Yao Wan 0001, Chengjie Guo, Bingcheng Liu, Jie Wu, Renxiang Li, Philip S. Yu. 6240-6254 [doi]
- F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingYushen Chen, Zhikang Niu, Ziyang Ma 0001, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu 0004, Xie Chen 0001. 6255-6271 [doi]
- AutoMedEval: Harnessing Language Models for Automatic Medical Capability EvaluationXiechi Zhang, Zetian Ouyang, Linlin Wang, Gerard de Melo, Zhu Cao, Xiaoling Wang, Ya Zhang 0002, Yanfeng Wang 0001, Liang He 0001. 6272-6285 [doi]
- CoT-based Synthesizer: Enhancing LLM Performance through Answer SynthesisBohan Zhang, Xiaokang Zhang, Jing Zhang, Jifan Yu, Sijia Luo, Jie Tang. 6286-6303 [doi]
- Efficiently Identifying Watermarked Segments in Mixed-Source TextsXuandong Zhao, Chenwen Liao, Yu-Xiang Wang, Lei Li. 6304-6316 [doi]
- Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning TasksFangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang 0012, Si-Qing Chen, Michael J. Wooldridge, Janet B. Pierrehumbert, Furu Wei. 6317-6342 [doi]
- Towards a More Generalized Approach in Open Relation ExtractionQing Wang, Yuepei Li, Qiao Qiao, Kang Zhou 0002, Qi Li 0012. 6343-6354 [doi]
- Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back HomeViktor Moskvoretskii, Maria Marina, Mikhail Salnikov, Nikolay Ivanov, Sergey Pletenev, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Irina Nikishina, Alexander Panchenko. 6355-6384 [doi]
- Evaluating Language Models as Synthetic Data GeneratorsSeungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan 0002, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin Lawrence, Sean Welleck, Graham Neubig. 6385-6403 [doi]
- Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?Yuyao Ge, Shenghua Liu, Baolong Bi, Yiwei Wang 0001, Lingrui Mei, Wenjie Feng 0001, Lizhe Chen, Xueqi Cheng. 6404-6420 [doi]
- Learning to Rewrite: Generalized LLM-Generated Text DetectionWei Hao, Ran Li, Weiliang Zhao, Junfeng Yang, Chengzhi Mao. 6421-6434 [doi]
- Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree SearchLinhao Yu, Xingguang Ji, Yahui Liu, Fanheng Kong, Chenxi Sun, Jingyuan Zhang, Hongzhi Zhang, Victoria W., Fuzheng Zhang, Deyi Xiong. 6435-6462 [doi]
- GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMsMaxim Zhelnin, Viktor Moskvoretskii, Egor Shvetsov, Mariya Krylova, Egor Venediktov, Aleksandr Zuev, Evgeny Burnaev. 6463-6480 [doi]
- Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability HypothesisHong Huang, Dapeng Wu 0001. 6481-6496 [doi]
- Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal ModelsAtsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li 0001, Hai Helen Li, Ziwei Liu 0002, Kiyoharu Aizawa. 6497-6540 [doi]
- AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language ModelsYuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding 0004, Yuxiao Dong. 6541-6558 [doi]
- Biased LLMs can Influence Political Decision-MakingJillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke. 6559-6607 [doi]
- LexTempus: Enhancing Temporal Generalizability of Legal Language Models Through Dynamic Mixture of ExpertsT. Y. S. S. Santosh, Tuan-Quang Vuong. 6608-6624 [doi]
- That is Unacceptable: the Moral Foundations of CancelingSoda Marem Lo, Oscar Araque, Rajesh Sharma, Marco Antonio Stranisci. 6625-6639 [doi]
- FloorPlan-LLaMa: Aligning Architects' Feedback and Domain Knowledge in Architectural Floor Plan GenerationJun Yin, Pengyu Zeng, Haoyuan Sun, Yuqin Dai, Han Zheng, Miao Zhang, Yachao Zhang, Shuai Lu. 6640-6662 [doi]
- TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem UnderstandingMax Ku, Cheuk Hei Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen. 6663-6684 [doi]
- FineReason: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingGuizhen Chen, Weiwen Xu, Hao Zhang 0048, Hou Pong Chan, Chaoqun Liu, Lidong Bing, Deli Zhao, Anh Tuan Luu, Yu Rong. 6685-6715 [doi]
- The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMsSergey Berezin, Reza Farahbakhsh, Noël Crespi. 6716-6730 [doi]
- Identifying Reliable Evaluation Metrics for Scientific Text RevisionLéane Jourdan, Nicolas Hernandez, Florian Boudin, Richard Dufour. 6731-6756 [doi]
- Can Language Models Reason about Individualistic Human Values and Preferences?Liwei Jiang, Taylor Sorensen, Sydney Levine, Yejin Choi 0001. 6757-6794 [doi]
- BERT-like Models for Slavic Morpheme SegmentationDmitry Morozov, Lizaveta Astapenka, Anna V. Glazkova, Timur Garipov, Olga Lyashevskaya. 6795-6815 [doi]
- Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token RecyclingXianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang 0033, Dongliang Xu. 6816-6831 [doi]
- Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation EngineeringXinyu Tang 0004, Xiaolei Wang 0005, Zhihao Lv, Yingqian Min, Xin Zhao 0018, Binbin Hu, Ziqi Liu, Zhiqiang Zhang 0012. 6832-6849 [doi]
- Drift: Enhancing LLM Faithfulness in Rationale Generation via Dual-Reward Probabilistic InferenceJiazheng Li 0002, Hanqi Yan, Yulan He 0001. 6850-6866 [doi]
- Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMsAngelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo. 6867-6893 [doi]
- MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language ModelsShojiro Yamabe, Futa Kai Waseda, Tsubasa Takahashi 0001, Koki Wataoka. 6894-6916 [doi]
- Dynamic Scaling of Unit Tests for Code Reward ModelingZeyao Ma, Xiaokang Zhang, Jing Zhang, Jifan Yu, Sijia Luo, Jie Tang. 6917-6935 [doi]
- UniConv: Unifying Retrieval and Response Generation for Large Language Models in ConversationsFengran Mo, Yifan Gao 0001, Chuan Meng, Xin Liu 0039, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li 0018, Xian Li, Bing Yin, Meng Jiang 0001. 6936-6949 [doi]
- Tracking Life's Ups and Downs: Mining Life Events from Social Media Posts for Mental Health AnalysisMinghao Lv, Siyuan Chen, Haoan Jin, Minghao Yuan, Qianqian Ju, Yujia Peng, Kenny Q. Zhu, Mengyue Wu. 6950-6965 [doi]
- ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlShengpeng Ji, Qian Chen 0003, Wen Wang 0001, Jialong Zuo, Minghui Fang 0002, Ziyue Jiang 0004, Hai Huang 0013, Zehan Wang 0001, Xize Cheng, Siqi Zheng, Zhou Zhao 0001. 6966-6981 [doi]
- PIC: Unlocking Long-Form Text Generation Capabilities of Large Language Models via Position ID CompressionHaoran Que, Wenge Rong. 6982-6995 [doi]
- Towards Effective Extraction and Evaluation of Factual ClaimsDasha Metropolitansky, Jonathan Larson. 6996-7045 [doi]
- Beyond Facts: Evaluating Intent Hallucination in Large Language ModelsYijie Hao, Haofei Yu, Jiaxuan You. 7046-7069 [doi]
- A Systematic Study of Compositional Syntactic Transformer Language ModelsYida Zhao, Hao Xve, Xiang Hu, Kewei Tu. 7070-7083 [doi]
- M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation EvaluationZhaopeng Feng, Jiayuan Su, Jiamei Zheng, Jiahan Ren, Yan Zhang 0004, Jian Wu 0001, Hongwei Wang 0001, Zuozhu Liu. 7084-7107 [doi]
- SongComposer: A Large Language Model for Lyric and Melody Generation in Song CompositionShuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang 0001, Rui Qian 0001, Junhao Huang, Conghui He, Dahua Lin, Jiaqi Wang 0003. 7108-7127 [doi]
- Personalized Text Generation with Contrastive Activation SteeringJinghao Zhang, Yuting Liu 0003, Wenjie Wang 0007, Qiang Liu 0006, Shu Wu, Liang Wang 0001, Tat-Seng Chua. 7128-7141 [doi]
- Gumbel Reranking: Differentiable End-to-End Reranker OptimizationSiyuan Huang 0003, Zhiyuan Ma, Jintao Du, Changhua Meng, Weiqiang Wang, Jingwen Leng, Minyi Guo, Zhouhan Lin. 7142-7161 [doi]
- Hybrid Preferences: Learning to Route Instances for Human vs. AI FeedbackLester James Validad Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar 0009, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi. 7162-7200 [doi]
- SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event DetectionYi-Fan Lu, Xian-Ling Mao, Tian Lan 0003, Tong Zhang, Yu-shi Zhu, Heyan Huang. 7201-7218 [doi]
- The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation ProjectAngelina Aspra Aquino, Lester James Validad Miranda, Elsie Marie T. Or. 7219-7239 [doi]
- DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based DistillationJennifer Chen, Aidar Myrzakhan, Yaxin Luo, Hassaan Muhammad Khan, Sondos Mahmoud Bsharat, Zhiqiang Shen. 7240-7260 [doi]
- G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent SystemsShilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang. 7261-7276 [doi]
- Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language ModelsBumjin Park, Leejinsil Leejinsil, Jaesik Choi. 7277-7296 [doi]
- LegalReasoner: Step-wised Verification-Correction for Legal Judgment ReasoningWeijie Shi, Han Zhu, Jiaming Ji, Mengze Li 0001, Jipeng Zhang, Ruiyuan Zhang, Jia Zhu 0003, Jiajie Xu 0001, Sirui Han, Yike Guo. 7297-7313 [doi]
- Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp ContextMaggie Mi, Aline Villavicencio, Nafise Sadat Moosavi. 7314-7332 [doi]
- ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code GenerationXuanle Zhao, Xianzhen Luo, Qi Shi 0002, Chi Chen 0005, Shuo Wang 0013, Zhiyuan Liu 0001, Maosong Sun 0001. 7333-7348 [doi]
- The Cross-linguistic Role of Animacy in Grammar StructuresNina Gregorio, Matteo Gay, Sharon Goldwater, Edoardo M. Ponti. 7349-7363 [doi]
- LexGen: Domain-aware Multilingual Lexicon GenerationAyush Maheshwari, Atul Kumar Singh, N. J. Karthika, Krishnakant Bhatt, Preethi Jyothi, Ganesh Ramakrishnan. 7364-7375 [doi]
- How to Train Long-Context Language Models (Effectively)Tianyu Gao 0001, Alexander Wettig, Howard Yen, Danqi Chen 0001. 7376-7399 [doi]
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction FusionQizhi Pei, Lijun Wu 0003, Zhuoshi Pan, Yu Li 0006, Honglin Lin, Chenlin Ming, Xin Gao 0001, Conghui He, Rui Yan 0001. 7400-7420 [doi]
- Mining Complex Patterns of Argumentative Reasoning in Natural Language DialogueRamon Ruiz-Dolz, Zlata Kikteva, John Lawrence. 7421-7435 [doi]
- OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser UseXueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen 0004, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao 0001, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang 0001, Jiwei Li 0001, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang 0002, Keting Yin, Zhou Zhao 0001, Hongxia Yang, Fan Wu 0006, Shengyu Zhang 0001, Fei Wu 0001. 7436-7465 [doi]
- Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language PlanningMingfei Lau, Qian Chen, Yeming Fang, Tingting Xu, Tongzhou Chen, Pavel Golik. 7466-7492 [doi]
- LLM as a Broken Telephone: Iterative Generation Distorts InformationAmr Mohamed 0005, Mingmeng Geng, Michalis Vazirgiannis, Guokan Shang. 7493-7509 [doi]
- VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual CuesJianshu Zhang 0003, Dongyu Yao, Renjie Pi, Paul Pu Liang, Yi R. Fung 0001. 7510-7545 [doi]
- Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality EstimationXiang Geng, Zhejian Lai, Jiajun Chen, Hao Yang, Shujian Huang. 7546-7560 [doi]
- Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative ModelsFan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao 0001, Ziwei Liu 0002. 7561-7582 [doi]
- Large Language Models Struggle to Describe the Haystack without Human Help: A Social Science-Inspired Evaluation of Topic ModelsZongxia Li, Lorena Calvo-Bartolomé, Alexander Miserlis Hoyle, Paiheng Xu, Daniel Kofi Stephens, Juan Francisco Fung, Alden Dima, Jordan Lee Boyd-Graber. 7583-7604 [doi]
- ActiView: Evaluating Active Perception Ability for Multimodal Large Language ModelsZiyue Wang, Chi Chen, Fuwen Luo, Yurui Dong, Yuanchi Zhang, Yuzhuang Xu, Xiaolong Wang, Peng Li, Yang Liu. 7605-7633 [doi]
- Enough Coin Flips Can Make LLMs Act BayesianRitwik Gupta, Rodolfo Corona, Jiaxin Ge, Eric Wang, Dan Klein 0001, Trevor Darrell, David M. Chan. 7634-7655 [doi]
- GAMEBoT: Transparent Assessment of LLM Reasoning in GamesWenye Lin, Jonathan Roberts 0004, Yunhan Yang, Samuel Albanie, Zongqing Lu 0002, Kai Han 0001. 7656-7682 [doi]
- A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key TokensZhijie Nie, Richong Zhang, Zhanyu Wu. 7683-7694 [doi]
- Commonsense Reasoning in Arab CultureAbdelrahman Boda Sadallah, Junior Cedric Tonga, Khalid Almubarak, Saeed Almheiri, Farah Atif, Chatrine Qwaider, Karima Kadaoui, Sara Shatnawi, Yaser Alesh, Fajri Koto. 7695-7710 [doi]
- AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based AgentsJunting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang 0029, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001, Qi Zhang 0066. 7711-7743 [doi]
- Translation and Fusion Improves Cross-lingual Information ExtractionYang Chen, Vedaant Shah, Alan Ritter. 7744-7764 [doi]
- Conditional Dichotomy Quantification via Geometric EmbeddingShaobo Cui 0006, Wenqing Liu, Yiyang Feng, Jiawei Zhou, Boi Faltings. 7765-7791 [doi]
- Aligning Large Language Models with Implicit Preferences from User-Generated ContentZhaoxuan Tan, Zheng Li 0018, Tianyi Liu, Haodong Wang, Hyokun Yun, Ming Zeng 0001, Pei Chen, Zhihan Zhang 0001, Yifan Gao 0001, Ruijie Wang 0004, Priyanka Nigam, Bing Yin, Meng Jiang 0001. 7792-7820 [doi]
- VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video QuestionsYuyan Chen, Jiyuan Jia, Jiaxin Lu, Siyue Li, Yu Guan, Ming Yang 0007, Qingpei Guo. 7821-7834 [doi]
- Large Language Models are Good Relational LearnersFang Wu, Vijay Prakash Dwivedi, Jure Leskovec. 7835-7854 [doi]
- SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic DataMichael Ogezi, Freda Shi. 7855-7875 [doi]
- Distilling an End-to-End Voice Assistant Without Instruction Training DataWilliam Barr Held, Yanzhe Zhang, Weiyan Shi, Minzhi Li, Michael J. Ryan, Diyi Yang. 7876-7891 [doi]
- CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language GamesShuhang Xu, Fangwei Zhong. 7892-7917 [doi]
- CER: Confidence Enhanced Reasoning in LLMsAli Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah. 7918-7938 [doi]
- Watermarking Large Language Models: An Unbiased and Low-risk MethodMinjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau. 7939-7960 [doi]
- On Synthetic Data Strategies for Domain-Specific Generative RetrievalHaoyang Wen, Jiang Guo, Yi Zhang, Jiarong Jiang, Zhiguo Wang. 7961-7976 [doi]
- LLM Braces: Straightening Out LLM Predictions with Relevant Sub-UpdatesYing Shen, Lifu Huang. 7977-7992 [doi]
- CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level InteractionsTamer Alkhouli, Katerina Margatina, James Gung, Raphael Shu, Claudia Zaghi, Monica Sunkara, Yi Zhang 0001. 7993-8006 [doi]
- Evaluating Theory of (an uncertain) Mind: Predicting the Uncertain Beliefs of Others from Conversational CuesAnthony B. Sicilia, Malihe Alikhani. 8007-8021 [doi]
- Uncertainty in Causality: A New FrontierShaobo Cui 0006, Luca Mouchel, Boi Faltings. 8022-8044 [doi]
- SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMsMichael J. Ryan, Omar Shaikh, Aditri Bhagirath, Daniel Frees, William Barr Held, Diyi Yang. 8045-8078 [doi]
- When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language ModelsJulia Mendelsohn, Ceren Budak. 8079-8103 [doi]
- AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety DetectionWeidi Luo, Shenghong Dai, Xiaogeng Liu, Suman Banerjee 0001, Huan Sun 0001, Muhao Chen 0001, Chaowei Xiao. 8104-8139 [doi]
- Improving Model Factuality with Fine-grained Critique-based EvaluatorYiqing Xie, Wenxuan Zhou, Pradyot Prakash, Di Jin, Yuning Mao, Quintin Fettes, Arya Talebzadeh, Sinong Wang, Han Fang, Carolyn P. Rosé, Daniel Fried, Hejia Zhang. 8140-8155 [doi]
- Building a Long Text Privacy Policy Corpus with Multi-Class LabelsFlorencia Marotta-Wurgler, David Stein. 8156-8219 [doi]
- R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-TrainingLeonardo Ranaldi, Federico Ranaldi, Giulia Pucci. 8220-8234 [doi]
- When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language modelsSamuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant. 8235-8253 [doi]
- Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language ModelsZixiang Xu, Yanbo Wang 0005, Yue Huang 0001, Xiuying Chen, Jieyu Zhao 0001, Meng Jiang 0001, Xiangliang Zhang 0001. 8254-8284 [doi]
- VLSBench: Unveiling Visual Leakage in Multimodal SafetyXuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang 0001, Jing Shao. 8285-8316 [doi]
- Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and ReasoningSky CH-Wang, Darshan Girish Deshpande, Smaranda Muresan, Anand Kannappan, Rebecca Qian. 8317-8331 [doi]
- Data Laundering: Artificially Boosting Benchmark Results through Knowledge DistillationJonibek Mansurov, Akhmed Sakip, Alham Fikri Aji. 8332-8345 [doi]
- Conspiracy Theories and Where to Find Them on TikTokFrancesco Corso, Francesco Pierri 0002, Gianmarco De Francisci Morales. 8346-8362 [doi]
- Growing Through Experience: Scaling Episodic Grounding in Language ModelsChunhui Zhang, Sirui Wang, Zhongyu Ouyang, Xiangchi Yuan, Soroush Vosoughi. 8363-8375 [doi]
- Exploiting the Shadows: Unveiling Privacy Leaks through Lower-Ranked Tokens in Large Language ModelsYuan Zhou, Zhuo Zhang, Xiangyu Zhang. 8376-8386 [doi]
- Attacking Vision-Language Computer Agents via Pop-upsYanzhe Zhang, Tao Yu 0009, Diyi Yang. 8387-8401 [doi]
- Explicit and Implicit Data Augmentation for Social Event DetectionCongbo Ma, Yuxia Wang, Jia Wu 0001, Jian Yang 0001, Jing Du 0003, Zitai Qiu, Qing Li, Hu Wang 0003, Preslav Nakov. 8402-8415 [doi]
- In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue AgentsZhen Tan 0001, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang 0002, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Rajan Iyer, Tianlong Chen 0001, Huan Liu 0001, Chen-Yu Lee, Tomas Pfister. 8416-8439 [doi]
- Revisiting Classical Chinese Event Extraction with Ancient Literature InformationXiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang. 8440-8451 [doi]
- Unanswerability Evaluation for Retrieval Augmented GenerationXiangyu Peng, Prafulla Kumar Choubey, Caiming Xiong, Chien-Sheng Wu. 8452-8472 [doi]
- SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human InterventionChengshuai Zhao, Zhen Tan 0001, Chau-Wai Wong, Xinyan Zhao, Tianlong Chen 0001, Huan Liu 0001. 8473-8503 [doi]
- Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical ReasoningErxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang. 8504-8519 [doi]
- RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkKunlun Zhu, Yifan Luo, Dingling Xu, Yukun Yan, Zhenghao Liu 0001, Shi Yu 0001, Ruobing Wang, Shuo Wang 0013, Yishan Li, Nan Zhang, Xu Han 0007, Zhiyuan Liu 0001, Maosong Sun 0001. 8520-8544 [doi]
- A Survey on Patent Analysis: From NLP to Multimodal AIHomaira Huda Shomee, Zhu Wang, Sathya N. Ravi, Sourav Medya. 8545-8561 [doi]
- SciVer: Evaluating Foundation Models for Multimodal Scientific Claim VerificationChengye Wang, Yifei Shen, Zexi Kuang, Arman Cohan, Yilun Zhao 0001. 8562-8579 [doi]
- MultiAgentBench : Evaluating the Collaboration and Competition of LLM agentsKunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian 0008, Robert Tang, Heng Ji 0001, Jiaxuan You. 8580-8622 [doi]
- Sinhala Encoder-only Language Models and EvaluationTharindu Ranasinghe, Hansi Hettiarachchi, Nadeesha Chathurangi Naradde Vidana Pathirana, Damith Premasiri, Lasitha Uyangodage, Isuri Anuradha Nanomi Arachchige, Alistair Plum, Paul Rayson, Ruslan Mitkov. 8623-8636 [doi]
- LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English WritingZhengxiang Wang, Veronika Makarova, Zhi Li, Jordan Kodner, Owen Rambow. 8637-8663 [doi]
- SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?Haomin Zhuang, Yihua Zhang, Kehan Guo, Jinghan Jia, Gaowen Liu, Sijia Liu 0001, Xiangliang Zhang 0001. 8664-8678 [doi]
- Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and ChallengesBolei Ma, Yuting Li, Wei Zhou, Ziwei Gong, Yang Janet Liu, Katja Jasinskaja, Annemarie Friedrich, Julia Hirschberg, Frauke Kreuter, Barbara Plank. 8679-8696 [doi]
- LocAgent: Graph-Guided LLM Agents for Code LocalizationZhaoling Chen, Robert Tang, Gangda Deng, Fang Wu, Jialong Wu 0010, Zhiwei Jiang, Viktor K. Prasanna, Arman Cohan, Xingyao Wang 0002. 8697-8727 [doi]
- COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline GenerationRaghvendra Kumar 0003, Mohammed Salman S. A, Aryan Sahu, Tridib Nandi, Pragathi Y. P., Sriparna Saha 0001, José G. Moreno 0001. 8728-8748 [doi]
- Mind the Gap: Static and Interactive Evaluations of Large Audio ModelsMinzhi Li, William Barr Held, Michael J. Ryan, Kunat Pipatanakul, Potsawee Manakul, Hao Zhu, Diyi Yang. 8749-8766 [doi]
- Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on ManchuRenhao Pei, Yihong Liu 0001, Peiqin Lin, François Yvon, Hinrich Schütze. 8767-8788 [doi]
- CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMsJizhan Fang, Tianhe Lu, Yunzhi Yao, Ziyan Jiang, Xin Xu 0010, Huajun Chen, Ningyu Zhang 0001. 8789-8807 [doi]
- TripleFact: Defending Data Contamination in the Evaluation of LLM-driven Fake News DetectionCheng Xu 0006, Nan Yan. 8808-8823 [doi]
- Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora AccessibilityXiaomeng Zhu 0004, Zhenghao Zhou, Simon Charlow, Robert Frank 0001. 8824-8842 [doi]
- Large Language and Reasoning Models are Shallow Disjunctive ReasonersIrtaza Khalid, Amir Masoud Nourollah, Steven Schockaert. 8843-8869 [doi]
- Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State GenerationSenyu Li, Zipeng Sun, Jiayi Wang, Xue Liu, Pontus Stenetorp, Siva Reddy, David Ifeoluwa Adelani. 8870-8880 [doi]
- Building Better: Avoiding Pitfalls in Developing Language Resources when Data is ScarceNedjma Ousidhoum, Meriem Beloucif, Saif M. Mohammad. 8881-8894 [doi]
- BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 LanguagesShamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva, Murja Sani Gadanya, Robert Geislinger, Bela Gipp, Oumaima Hourrane, Oana Ignat, Falalu Ibrahim Lawan, Rooweither Mabuya, Rahmad Mahendra, Vukosi Marivate, Alexander Panchenko, Andrew Piper, Charles Henrique Porto Ferreira, Vitaly Protasov, Samuel Rutunda, Manish Shrivastava 0001, Aura Cristina Udrea, Lilian Diana Awuor Wanzare, Sophie Wu, Florian Valentin Wunderlich, Hanif Muhammad Zhafran, Tianhui Zhang, Yi Zhou 0019, Saif M. Mohammad. 8895-8916 [doi]
- SkillVerse : Assessing and Enhancing LLMs with Tree EvaluationYufei Tian, Jiao Sun, Nanyun Peng 0001, Zizhao Zhang. 8917-8933 [doi]
- CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM EraYanlin Feng, Simone Papicchio, Sajjadur Rahman. 8934-8958 [doi]
- Empathy Prediction from Diverse PerspectivesFrancine Chen 0001, Scott A. Carter, Tatiana Lau, Nayeli Suseth Bravo, Sumanta Bhattacharyya, Kate A. Sieck, Charlene C. Wu. 8959-8974 [doi]
- Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practiceFederico Ravenda, Seyed Ali Bahrainian, Andrea Raballo, Antonietta Mira, Noriko Kando. 8975-8991 [doi]
- INTERACT: Enabling Interactive, Question-Driven Learning in Large Language ModelsAum Kendapadi, Kerem Zaman, Rakesh R. Menon, Shashank Srivastava. 8992-9024 [doi]
- Circuit Stability Characterizes Language Model GeneralizationAlan Sun. 9025-9040 [doi]
- Comparing LLM-generated and human-authored news text using formal syntactic theoryOlga Zamaraeva, Dan Flickinger, Francis Bond, Carlos Gómez-Rodríguez. 9041-9060 [doi]
- Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying ProbesSharan Maiya, Yinhong Liu, Ramit Debnath, Anna Korhonen. 9061-9081 [doi]
- White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMsYixin Wan, Kai-Wei Chang. 9082-9108 [doi]
- AIMSCheck: Leveraging LLMs for AI-Assisted Review of Modern Slavery Statements Across JurisdictionsAdriana Eufrosina Bora, Akshatha Arodi, Duoyi Zhang, Jordan Bannister, Mirko Bronzi, Arsène Fansi Tchango, Md. Abul Bashar, Richi Nayak, Kerrie L. Mengersen. 9109-9135 [doi]
- Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual EvidenceMohsen Fayyaz, Ali Modarressi, Hinrich Schütze, Nanyun Peng 0001. 9136-9152 [doi]
- SelfElicit: Your Language Model Secretly Knows Where is the Relevant EvidenceZhining Liu 0002, Rana Ali Amjad, Ravinarayana Adkathimar, Tianxin Wei, Hanghang Tong. 9153-9173 [doi]
- The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual SubjectsYixin Wan, Kai-Wei Chang. 9174-9190 [doi]
- Mitigating Shortcut Learning with InterpoLated LearningMichalis Korakakis, Andreas Vlachos 0003, Adrian Weller. 9191-9206 [doi]
- Toward Automatic Discovery of a Canine Phonetic AlphabetTheron S. Wang, Xingyuan Li, Hridayesh Lekhak, Tuan Minh Dang, Mengyue Wu, Kenny Q. Zhu. 9207-9219 [doi]
- DavIR: Data Selection via Implicit Reward for Large Language ModelsHaotian Zhou, Tingkai Liu, Qianli Ma, Yufeng Zhang, Jianbo Yuan, Pengfei Liu, Yang You 0001, Hongxia Yang. 9220-9237 [doi]
- Byte Latent Transformer: Patches Scale Better Than TokensArtidoro Pagnoni, Ramakanth Pasunuru, Pedro Rodríguez 0001, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason E. Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srini Iyer 0001. 9238-9258 [doi]
- DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative DenoisingZhenhao Li 0003, Huichi Zhou, Marek Rei, Lucia Specia. 9259-9274 [doi]
- Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language ModelsHuanhuan Wei, Xiao Luo 0001, Hongyi Yu, Jinping Liang, Luning Yang, Lixing Lin, Alexandra Popa, Xiting Yan. 9275-9289 [doi]
- Culture Matters in Toxic Language Detection in PersianZahra Bokaei, Walid Magdy, Bonnie Webber. 9290-9304 [doi]
- Bitnet.cpp: Efficient Edge Inference for Ternary LLMsJinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia 0005, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei. 9305-9322 [doi]
- Instance-Selection-Inspired Undersampling Strategies for Bias Reduction in Small and Large Language Models for Binary Text ClassificationGuilherme Fonseca, Washington Cunha, Gabriel Prenassi, Marcos André Gonçalves, Leonardo Chaves Dutra da Rocha. 9323-9340 [doi]
- Forward Knows Efficient Backward Path: Saliency-Guided Memory-Efficient Fine-tuning of Large Language ModelsYeachan Kim, SangKeun Lee 0001. 9341-9356 [doi]
- Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment TuningAofei Chang, Le Huang, Alex James Boyd, Parminder Bhatia, Taha A. Kass-Hout, Cao Xiao, Fenglong Ma. 9357-9372 [doi]
- LLMs + Persona-Plug = Personalized LLMsJiongnan Liu, Yutao Zhu 0001, Shuting Wang 0002, Xiaochi Wei, Erxue Min, Yu Lu, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou. 9373-9385 [doi]
- Developmentally-plausible Working Memory Shapes a Critical Period for Language AcquisitionMasato Mita, Ryo Yoshida, Yohei Oseki. 9386-9399 [doi]
- IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular DataTao Feng 0013, Lizhen Qu, Niket Tandon, Gholamreza Haffari. 9400-9428 [doi]
- INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African LanguagesHao Yu, Jesujoba Oluwadara Alabi, Andiswa Bukula, Jian Yun Zhuang, En-Shiun Annie Lee, Tadesse Kebede Guge, Israel Abebe Azime, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson Koffi Kalipe, Jonathan Mukiibi, Salomon Kabongo Kabenamualu, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Dietrich Klakow, David Ifeoluwa Adelani. 9429-9452 [doi]
- Boosting Long-Context Information Seeking via Query-Guided Activation RefillingHongjin Qian, Zheng Liu 0011, Peitian Zhang, Zhicheng Dou, Defu Lian. 9453-9464 [doi]
- Efficient Pretraining Data Selection for Language Models via Multi-Actor CollaborationTianyi Bai, Ling Yang 0006, Zhen Hao Wong, Fupeng Sun, Xinlin Zhuang, Jiahui Peng, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang 0001, Binhang Yuan, Conghui He. 9465-9491 [doi]
- AdaDHP: Fine-Grained Fine-Tuning via Dual Hadamard Product and Adaptive Parameter SelectionHan Liu 0008, Changya Li, Xiaotong Zhang 0003, Feng Zhang 0027, Fenglong Ma, Wei Wang 0077, Hong Yu 0005. 9492-9504 [doi]
- KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge GraphJinhao Jiang, Kun Zhou 0002, Xin Zhao 0018, Yang Song 0021, Chen Zhu 0003, Hengshu Zhu, Ji-Rong Wen. 9505-9523 [doi]
- Curriculum Debiasing: Toward Robust Parameter-Efficient Fine-Tuning Against Dataset BiasesMingyu Lee, Yeachan Kim, Wing-Lam Mok, SangKeun Lee 0001. 9524-9540 [doi]
- Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual SettingsAustin Xu, Srijan Bansal, Yifei Ming, Semih Yavuz, Shafiq Joty. 9541-9564 [doi]
- On the Reliability of Large Language Models for Causal DiscoveryTao Feng 0013, Lizhen Qu, Niket Tandon, Zhuang Li 0001, Xiaoxi Kang, Gholamreza Haffari. 9565-9590 [doi]
- Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsJingxuan Li, Yuning Yang, Shengqi Yang, Linfan Zhang, Ying Nian Wu. 9591-9610 [doi]
- TeRDy: Temporal Relation Dynamics through Frequency Decomposition for Temporal Knowledge Graph CompletionZiyang Liu 0004, Chaokun Wang. 9611-9622 [doi]
- Incorporating Domain Knowledge into Materials TokenizationYerim Oh, Jun-Hyung Park, Junho Kim, Sungho Kim, SangKeun Lee 0001. 9623-9644 [doi]
- PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context OptimizationYidan Wang, Yanan Cao 0001, Yubing Ren, Fang Fang 0009, Zheng Lin 0001, Binxing Fang. 9645-9660 [doi]
- Agents Under Siege: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt AttacksRana Muhammad Shahroz, Zhen Tan 0001, Sukwon Yun, Charles Fleming, Tianlong Chen 0001. 9661-9674 [doi]
- Semantic-Eval : A Semantic Comprehension Evaluation Framework for Large Language Models Generation without TrainingShusheng Li, Jiale Li, Yifei Qu, Xinwei Shi, Yanliang Guo, Ziyi He, Yubo Wang, Wenjun Tan. 9675-9690 [doi]
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesMichael Y. Hu, Jackson Petty, Chuan Shi, William Merrill, Tal Linzen. 9691-9709 [doi]
- When to Speak, When to Abstain: Contrastive Decoding with AbstentionHyuhng Joon Kim, Youna Kim, Sang-goo Lee, Taeuk Kim. 9710-9730 [doi]
- On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMsHerun Wan, Minnan Luo, Zhixiong Su, Guang Dai, Xiang Zhao 0002. 9731-9761 [doi]
- Investigating and Extending Homans' Social Exchange Theory with Large Language Model based AgentsLei Wang, Zheqing Zhang, Xu Chen. 9762-9777 [doi]
- A Drop-In Solution for On-the-Fly Adaptation of Speculative Decoding in Large Language ModelsJiesong Liu, Brian Park, Xipeng Shen. 9778-9794 [doi]
- If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?Ryo Yoshida, Shinnosuke Isono, Kohei Kajikawa, Taiga Someya, Yushi Sugimoto, Yohei Oseki. 9795-9812 [doi]
- Aligning VLM Assistants with Personalized Situated CognitionYongqi Li 0002, Shen Zhou, Xiaohu Li, Xin Miao, Jintao Wen, Mayi Xu, Jianhao Chen, Birong Pan, Hankun Kang, Yuanyuan Zhu 0001, Ming Zhong 0002, Tieyun Qian. 9813-9839 [doi]
- Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language ModelsZhisong Zhang, Yan Wang 0060, Xinting Huang, Tianqing Fang, Hongming Zhang 0009, Chenlong Deng, Shuaiyi Li, Dong Yu 0001. 9840-9855 [doi]
- Faster Speculative Decoding via Effective Draft Decoder with Pruned Candidate TreeHuanran Zheng, Xiaoling Wang. 9856-9868 [doi]
- Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language ModelsZhuojun Ding, Wei Wei 0002, Chenghao Fan. 9869-9886 [doi]
- Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based AgentsTao Wu, Jingyuan Chen, Wang Lin, Mengze Li, Yumeng Zhu, Ang Li, Kun Kuang, Fei Wu. 9887-9908 [doi]
- CADReview: Automatically Reviewing CAD Programs with Error Detection and CorrectionJiali Chen, Xusen Hei, Hongfei Liu, Yuancheng Wei, Zikun Deng, Jiayuan Xie, Yi Cai 0001, Qing Li 0001. 9909-9927 [doi]
- Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward ModelingJunyi Li, Hwee Tou Ng. 9928-9942 [doi]
- The Lawyer That Never Thinks: Consistency and Fairness as Keys to Reliable AIDana R. Alsagheer, Abdulrahman Kamal, Mohammad Kamal, Cosmo Yang Wu, Weidong Shi. 9943-9954 [doi]
- Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in KoreanSungho Kim, Nayeon Kim 0002, Taehee Jeon, SangKeun Lee 0001. 9955-9984 [doi]
- SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation MethodsWen Huang 0004, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian. 9985-9998 [doi]
- ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code GenerationHouxing Ren, Mingjie Zhan, Zhongyuan Wu, Aojun Zhou, Junting Pan, Hongsheng Li 0001. 9999-10020 [doi]
- InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd BehaviorHuisheng Wang, Zhuoshi Pan, Hangjing Zhang, Mingxiao Liu, Hanqing Gao, H. Vicky Zhao. 10021-10052 [doi]
- Enhancing Neural Machine Translation Through Target Language Data: A kNN-LM Approach for Domain AdaptationAbudurexiti Reheman, Hongyu Liu, Junhao Ruan, Abudukeyumu Abudula, Yingfeng Luo, Tong Xiao, Jingbo Zhu. 10053-10065 [doi]
- Multi-level Relevance Document Identifier Learning for Generative RetrievalFuwei Zhang, Xiaoyu Liu, Xinyu Jia, Yingfei Zhang, Shuai Zhang, Xiang Li 0067, Fuzhen Zhuang, Wei Lin, Zhao Zhang 0011. 10066-10080 [doi]
- EfficientQAT: Efficient Quantization-Aware Training for Large Language ModelsMengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao 0007, Kaipeng Zhang, Ping Luo 0002. 10081-10100 [doi]
- Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision EncoderSiting Li, Pang Wei Koh, Simon Shaolei Du. 10101-10119 [doi]
- NexusSum: Hierarchical LLM Agents for Long-Form Narrative SummarizationHyuntak Kim, Byung-Hak Kim. 10120-10157 [doi]
- HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language ModelsXiao Wang, Jingyun Hua, Weihong Lin, Yuanxing Zhang, Fuzheng Zhang, Jianlong Wu, Di Zhang, Liqiang Nie. 10158-10181 [doi]
- Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's EducationYanhao Jia, Xinyi Wu, Li Hao, Qinglin Zhang, Yuxiao Hu, Shuai Zhao, Wenqi Fan. 10182-10197 [doi]
- DenseLoRA: Dense Low-Rank Adaptation of Large Language ModelsLin Mu, Xiaoyu Wang, Li Ni, Yang Li, Zhize Wu, Peiquan Jin, Yiwen Zhang. 10198-10211 [doi]
- Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and AnalysisJisoo Mok, Ik-Hwan Kim, Sangkwon Park, Sungroh Yoon. 10212-10239 [doi]
- Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language ModelsYuheng Chen, Pengfei Cao, Yubo Chen 0001, Yining Wang, Shengping Liu, Kang Liu 0001, Jun Zhao 0001. 10240-10261 [doi]
- Towards Context-Robust LLMs: A Gated Representation Fine-tuning ApproachShenglai Zeng, Pengfei He, Kai Guo 0003, Tianqi Zheng, Hanqing Lu, Yue Xing 0002, Hui Liu 0031. 10262-10276 [doi]
- On Support Samples of Next Word PredictionYuqian Li, Yupei Du, Yufang Liu, Feifei Feng, Mou Xiao Feng, Yuanbin Wu. 10277-10289 [doi]
- WebWalker: Benchmarking LLMs in Web TraversalJialong Wu 0007, Wenbiao Yin, Yong Jiang 0001, Zhenglin Wang, Zekun Xi, Runnan Fang, Linhai Zhang, Yulan He 0001, Deyu Zhou, Pengjun Xie, Fei Huang 0002. 10290-10305 [doi]
- From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language ModelsYidan Wang, Yubing Ren, Yanan Cao 0001, Binxing Fang. 10306-10322 [doi]
- AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMsHongxin Li, Jingfan Chen, Jingran Su, YunTao Chen, Qing Li 0001, Zhaoxiang Zhang 0001. 10323-10358 [doi]
- Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation MiningJingwen Sun, Zhiyi Tian, Yu He, Jingwei Sun 0001, Guangzhong Sun. 10359-10374 [doi]
- S-RAG: A Novel Audit Framework for Detecting Unauthorized Use of Personal Data in RAG SystemsZhirui Zeng, Jiamou Liu, Meng-Fen Chiang, Jialing He, Zijian Zhang 0001. 10375-10385 [doi]
- Praetor: A Fine-Grained Generative LLM Evaluator with Instance-Level Customizable Evaluation CriteriaYongqi Leng, Renren Jin, Yue Chen, Zhuowen Han, Ling Shi, Jianxiang Peng, Lei Yang, Juesi Xiao, Deyi Xiong. 10386-10418 [doi]
- Mitigating Confounding in Speech-Based Dementia Detection through Weight MaskingZhecheng Sheng, Xiruo Ding, Brian Hur, Changye Li 0001, Trevor Cohen, Serguei V. S. Pakhomov. 10419-10434 [doi]
- MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical StudiesYang Liu 0353, Jiahuan Cao, Hiuyi Cheng, Yongxin Shi, Kai Ding 0009, Lianwen Jin. 10435-10492 [doi]
- The Knowledge Microscope: Features as Better Analytical Lenses than NeuronsYuheng Chen, Pengfei Cao, Kang Liu, Jun Zhao. 10493-10515 [doi]
- From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed GroundingChiwei Zhu, Benfeng Xu, Xiaorui Wang, Zhendong Mao 0001. 10516-10543 [doi]
- PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal ComplianceHaoran Li 0003, Wenbin Hu, Huihao Jing, Yulin Chen, Qi Hu, Sirui Han, Tianshu Chu, Peizhao Hu, Yangqiu Song. 10544-10559 [doi]
- Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit ViewYanran Wu, Inez Hua, Yi Ding 0006. 10560-10576 [doi]
- ExpeTrans: LLMs Are Experiential Transfer LearnersJinglong Gao, Xiao Ding, Lingxiao Zou, Bibo Cai, Bing Qin 0001, Ting Liu 0001. 10577-10616 [doi]
- Cool-Fusion: Fuse Large Language Models without TrainingCong Liu, Xiaojun Quan, Yan Pan, Weigang Wu, Xu Chen 0004, Liang Lin. 10617-10627 [doi]
- DAPE V2: Process Attention Score as Feature Map for Length ExtrapolationChuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng 0001, Xin Jiang, Zhenguo Li, Yu Li. 10628-10666 [doi]
- MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive TrainingHui Huang 0021, Jiaheng Liu, Yancheng He, Shilong Li, Bing Xu, Conghui Zhu, Muyun Yang, Tiejun Zhao. 10667-10686 [doi]
- LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration DistillationZican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Xin Zhao, Bingning Wang, Weipeng Chen. 10687-10707 [doi]
- APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUsYuxiang Huang, Mingye Li, Xu Han 0007, Chaojun Xiao, Weilin Zhao, Sun Ao, Hao Zhou 0012, Jie Zhou 0016, Zhiyuan Liu 0001, Maosong Sun 0001. 10708-10727 [doi]
- PPT: A Minor Language News Recommendation Model via Cross-Lingual Preference Pattern TransferYiyang Zhang, Nan Chen. 10728-10745 [doi]
- GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal SynthesisYi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Bing Qin 0001. 10746-10757 [doi]
- Top-nσ: Eliminating Noise in Logit Space for Robust Token Sampling of LLMChenxia Tang, JianChun Liu, Hongli Xu, Liusheng Huang. 10758-10774 [doi]
- SCOPE: Optimizing Key-Value Cache Compression in Long-context GenerationJialong Wu 0007, Zhenglin Wang, Linhai Zhang, Yilong Lai, Yulan He 0001, Deyu Zhou. 10775-10790 [doi]
- Mitigating Non-Representative Prototypes and Representation Bias in Few-Shot Continual Relation ExtractionThanh Duc Pham, Nam Le Hai, Linh Ngo Van 0001, Nguyen Thi Ngoc Diep, Sang Dinh, Thien Huu Nguyen. 10791-10809 [doi]
- MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware ExpertsWei Tao, Haocheng Lu, Xiaoyang Qu, Bin Zhang, Kai Lu, Jiguang Wan, Jianzong Wang. 10810-10820 [doi]
- PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and RestorationZiqian Zeng, Jianwei Wang, Junyao Yang, Zhengdong Lu, Haoran Li, Huiping Zhuang, Cen Chen 0002. 10821-10855 [doi]
- Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language ModelsXinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He. 10856-10896 [doi]
- GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and ReasoningQingchen Yu, Zifan Zheng, Ding Chen, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li. 10897-10912 [doi]
- Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy CompetitionKehua Feng, Keyan Ding, Hongzhi Tan, Kede Ma, Zhihua Wang 0002, Shuangquan Guo, Yuzhou Cheng, Ge Sun, Guozhou Zheng, Qiang Zhang 0026, Huajun Chen. 10913-10947 [doi]
- DTCRS: Dynamic Tree Construction for Recursive SummarizationGuanran Luo, ZhongQuan Jian, Wentao Qiu, Meihong Wang, Qingqiang Wu 0001. 10948-10963 [doi]
- A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph ReasoningZhiyu Zhang, Wei Chen 0105, Youfang Lin, Huaiyu Wan. 10964-10977 [doi]
- ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive SearchYize Zhang, Tianshu Wang 0002, Sirui Chen, Kun Wang, Xingyu Zeng, Hongyu Lin, Xianpei Han, Le Sun 0001, Chaochao Lu. 10978-10995 [doi]
- PKAG-DDI: Pairwise Knowledge-Augmented Language Model for Drug-Drug Interaction Event Text GenerationZiyan Wang, Zhankun Xiong, Feng Huang 0004, Wen Zhang 0008. 10996-11010 [doi]
- Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language ModelsShuai Niu, Jing Ma 0004, Hongzhan Lin 0001, Liang Bai, Zhihua Wang 0008, Richard Yi Da Xu, Yunya Song, Xian Yang 0001. 11011-11024 [doi]
- TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image ModelsXindi Li, Zhe Liu 0001, Tong Zhang, Jiahao Chen, Qingming Li, Jinbao Li, Shouling Ji. 11025-11041 [doi]
- Frictional Agent Alignment Framework: Slow Down and Don't Break ThingsAbhijnan Nath, Carine Graff, Andrei Bachinin, Nikhil Krishnaswamy. 11042-11089 [doi]
- Powerformer: Efficient and High-Accuracy Privacy-Preserving Language Model with Homomorphic EncryptionDongjin Park, Eunsang Lee, Joon-Woo Lee. 11090-11111 [doi]
- Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMsWeixiang Zhao, Yulin Hu, Yang Deng 0002, Jiahe Guo, Xingyu Sui, Xinyang Han, An Zhang 0003, Yanyan Zhao, Bing Qin 0001, Tat-Seng Chua, Ting Liu 0001. 11112-11137 [doi]
- Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?Zihao Li 0006, Lecheng Zheng, Bowen Jin, Dongqi Fu, Baoyu Jing, Yikun Ban, Jingrui He, Jiawei Han 0001. 11138-11165 [doi]
- Towards Enhanced Immersion and Agency for LLM-based Interactive DramaHongqiu Wu, Weiqi Wu, Tianyang Xu, Jiameng Zhang, Hai Zhao 0001. 11166-11182 [doi]
- Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresShun Inadumi, Nobuhiro Ueda, Koichiro Yoshino. 11183-11198 [doi]
- Improving Factuality with Explicit Working MemoryMingda Chen, Yang Li, Karthik Padthe, Rulin Shao, Alicia Yi Sun, Luke Zettlemoyer, Gargi Ghosh, Wen-tau Yih. 11199-11213 [doi]
- Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language ModelsChengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao 0001, Qing He 0003. 11214-11232 [doi]
- Dynamic Parallel Tree Search for Efficient LLM ReasoningYifu Ding, Wentao Jiang, Shunyu Liu 0001, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu 0002, Bo Du 0001, Xianglong Liu 0001, Dacheng Tao. 11233-11252 [doi]
- Pre³: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationJunyi Chen, Shihao Bai, Zaijun Wang, Siyu Wu, Chuheng Du, Hailong Yang, Ruihao Gong, Shengzhong Liu, Fan Wu 0006, Guihai Chen. 11253-11267 [doi]
- SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQLGe Qu, Jinyang Li 0003, Bowen Qin, Xiaolong Li, Nan Huo, Chenhao Ma 0001, Reynold Cheng. 11268-11292 [doi]
- GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language ModelsTao Zhang 0019, Ziqian Zeng, YuxiangXiao YuxiangXiao, Huiping Zhuang, Cen Chen 0002, James R. Foulds, Shimei Pan. 11293-11311 [doi]
- Large Language and Protein Assistant for Protein-Protein Interactions PredictionPeng Zhou 0011, Pengsen Ma, Jianmin Wang 0016, Xibao Cai, Haitao Huang, Wei Liu 0005, Longyue Wang, Lai Hou Tim, Xiangxiang Zeng. 11312-11327 [doi]
- An Empirical Study of Many-to-Many Summarization with Large Language ModelsJiaan Wang, Fandong Meng, Zengkui Sun, Yunlong Liang, Yuxuan Cao, Jiarong Xu, Haoxiang Shi, Jie Zhou 0016. 11328-11344 [doi]
- Locate-and-Focus: Enhancing Terminology Translation in Speech Language ModelsSuhang Wu, Jialong Tang, Chengyi Yang, Pei Zhang 0011, Baosong Yang, Junhui Li, Junfeng Yao, Min Zhang 0005, Jinsong Su. 11345-11360 [doi]
- GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM AgentsLingxiao Diao, Xinyue Xu, Wanxuan Sun, Cheng Yang, Zhuosheng Zhang 0001. 11361-11399 [doi]
- TC-RAG: Turing-Complete RAG's Case study on Medical LLM SystemsXinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen 0103, Wentao Zhang 0008, Ruizhe Zhang 0013, Yuchen Fang 0001, Xinyu Ma, Xu Chu, Junfeng Zhao 0001, Yasha Wang. 11400-11426 [doi]
- SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-TuningZexiong Ma, Chao Peng 0002, Pengfei Gao, Xiangxin Meng, Yanzhen Zou, Bing Xie. 11427-11441 [doi]
- MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsZhongzhan Huang, Guoming Ling, ShanShan Zhong, Hefeng Wu, Liang Lin. 11442-11460 [doi]
- Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAGXin Sun, Jianan Xie, Zhongqi Chen, Qiang Liu, Shu Wu, Yuehe Chen, Bowen Song, Zilei Wang, Weiqiang Wang, Liang Wang. 11461-11480 [doi]
- PwnGPT: Automatic Exploit Generation Based on Large Language ModelsWanzong Peng, Lin Ye, Xuetao Du, Hongli Zhang 0001, Dongyang Zhan, Yunting Zhang, Yicheng Guo, Chen Zhang. 11481-11494 [doi]
- VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMsCuc Thi Bui, Nguyen Truong Son, Trang Van Truong, Viet Lam Phung, Pham Nhut Huy, Hoang-Anh Le, Quoc Huu Van, Phong Nguyen-Thuan Do, Van Le Tran Truc, Duc Thanh Chau, Le Minh Nguyen. 11495-11515 [doi]
- Scaling up the State Size of RNN LLMs for Long-Context ScenariosKai Liu, Jianfei Gao 0003, Kai Chen 0026. 11516-11529 [doi]
- Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion ProcessesBocheng Li, Zhujin Gao, Linli Xu. 11530-11551 [doi]
- A Strategic Coordination Framework of Small LMs Matches Large LMs in Data SynthesisXin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Lijun Wu, Conghui He. 11552-11570 [doi]
- Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from PsychometricsWenrui Xu, Dalin Lyu, Weihang Wang 0010, Jie Feng 0002, Chen Gao 0001, Yong Li 0008. 11571-11590 [doi]
- SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical EvaluationWenyu Zhang, Wei En Ng, Lixin Ma, Yuwen Wang, Junqi Zhao, Allison Koenecke, Boyang Li, Wanglu Wanglu. 11591-11609 [doi]
- User-side Model Consistency Monitoring for Open Source Large Language Models Inference ServicesQijun Miao, Zhixuan Fang. 11610-11622 [doi]
- Jailbreaking? One Step Is Enough!Weixiong Zheng, Peijian Zeng, Yiwei Li, Hongyan Wu, Nankai Lin, Junhao Chen, Aimin Yang 0002, Yongmei Zhou. 11623-11642 [doi]
- Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored TuningYongxin Xu, Ruizhe Zhang 0013, Xinke Jiang, Yujie Feng, Yuzhen Xiao, Xinyu Ma, Runchuan Zhu, Xu Chu, Junfeng Zhao 0001, Yasha Wang. 11643-11662 [doi]
- PaSa: An LLM Agent for Comprehensive Academic Paper SearchYichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E. 11663-11679 [doi]
- Less Mature is More Adaptable for Sentence-level Language ModelingAbhilasha Sancheti, David Dale, Artyom Kozhevnikov, Maha Elbayad. 11680-11695 [doi]
- EpMAN: Episodic Memory AttentioN for Generalizing to Longer ContextsSubhajit Chaudhury, Payel Das, Sarathkrishna Swaminathan, Georgios Kollias, Elliot Nelson, Khushbu Pahwa, Tejaswini Pedapati, Igor Melnyk, Matthew Riemer. 11696-11708 [doi]
- UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter Efficient Fine-Tuning of Large ModelsXueyan Zhang, Jinman Zhao, Zhifei Yang, Yibo Zhong, Shuhao Guan, Linbo Cao, Yining Wang. 11709-11728 [doi]
- Agri-CM³: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and ReasoningHaotian Wang, Yi Guan, Fanshu Meng, Chao Zhao, Lian Yan, Yang Yang 0041, Jingchi Jiang. 11729-11754 [doi]
- TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship ClassificationJunnan Zhu, Min Xiao, Yining Wang, Feifei Zhai, Yu Zhou, Chengqing Zong. 11755-11771 [doi]
- CaLMQA: Exploring culturally specific long-form question answering across 23 languagesShane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi. 11772-11817 [doi]
- Croppable Knowledge Graph EmbeddingYushan Zhu, Wen Zhang 0015, Zhiqiang Liu, Mingyang Chen, Lei Liang 0002, Huajun Chen. 11818-11835 [doi]
- HyKGE: A Hypothesis Knowledge Graph Enhanced RAG Framework for Accurate and Reliable Medical LLMs ResponsesXinke Jiang, Ruizhe Zhang 0013, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, Jinyi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao 0001, Yasha Wang. 11836-11856 [doi]
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language ModelsZhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, WangYan WangYan, Wei Shen, Qing Gu 0001, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi. 11857-11870 [doi]
- BeamLoRA: Beam-Constraint Low-Rank AdaptationNaibin Gu, Zhenyu Zhang 0006, Xiyu Liu 0003, Peng Fu 0008, Zheng Lin 0001, Shuohuan Wang, Yu Sun, Hua Wu 0003, Weiping Wang 0005, Haifeng Wang 0001. 11871-11883 [doi]
- GODBench: A Benchmark for Multimodal Large Language Models in Video Comment ArtYiming Lei, Chenkai Zhang, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang 0001. 11884-11952 [doi]
- UniLR: Unleashing the Power of LLMs on Multiple Legal Tasks with a Unified Legal RetrieverAng Li, Yiquan Wu 0001, YiFei Liu, Ming Cai, Lizhi Qing, Shihang Wang, Yangyang Kang, Chengyuan Liu, Fei Wu 0001, Kun Kuang. 11953-11967 [doi]
- Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language ModelsHaoran Ye, Tianze Zhang, Yuhang Xie, Liyuan Zhang, Yuanyi Ren, Xin Zhang, Guojie Song. 11968-11991 [doi]
- Beyond Dialogue: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language ModelYeyong Yu, Runsheng Yu, Haojie Wei, Zhanqiu Zhang, Quan Qian. 11992-12022 [doi]
- ACECODER: Acing Coder RL via Automated Test-Case SynthesisHuaye Zeng, Dongfu Jiang, Haozhe Wang 0002, Ping Nie, Xiaotong Chen, Wenhu Chen. 12023-12040 [doi]
- Quantifying Semantic Emergence in Language ModelsHang Chen, Xinyu Yang, Jiaying Zhu, Wenya Wang. 12041-12054 [doi]
- DebateCoder: Towards Collective Intelligence of LLMs via Test Case Driven LLM Debate for Code GenerationJizheng Chen, Kounianhua Du, Xinyi Dai, Weiming Zhang, Xihuai Wang, Yasheng Wang, Ruiming Tang, Weinan Zhang 0001, Yong Yu 0001. 12055-12065 [doi]
- The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language ModelsChen Qian 0003, Dongrui Liu, Jie Zhang 0012, Yong Liu, Jing Shao. 12066-12095 [doi]
- GraphInsight: Unlocking Insights in Large Language Models for Graph Structure UnderstandingYukun Cao, Shuo Han, Zengyi Gao, Zezhong Ding, Xike Xie, S. Kevin Zhou. 12096-12134 [doi]
- Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic SegmentationMichael S. Yantosca, Albert M. K. Cheng. 12135-12147 [doi]
- A Multi-persona Framework for Argument Quality AssessmentBojun Jin, Jianzhu Bao, Yufang Hou, Yang Sun, Yice Zhang, Huajie Wang, Bin Liang 0004, Ruifeng Xu 0001. 12148-12170 [doi]
- Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationChengwu Liu 0001, Ye Yuan 0016, Yichun Yin, Yan Xu, Xin Xu, Zaoyu Chen, Yasheng Wang, Lifeng Shang, Qun Liu 0001, Ming Zhang 0004. 12171-12186 [doi]
- SAM Decoding: Speculative Decoding via Suffix AutomatonYuxuan Hu, Ke Wang, Xiaokang Zhang, Fanjin Zhang, Cuiping Li, Hong Chen, Jing Zhang. 12187-12204 [doi]
- PsyAdvisor: A Plug-and-Play Strategy Advice Planner with Proactive Questioning in Psychological ConversationsYuxin Hu, Danni Liu, Bo Liu 0004, Yida Chen, Jiuxin Cao, Yan Liu. 12205-12229 [doi]
- HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple DevicesSilin Li, Yuhang Guo 0001, Jiashu Yao, Zeming Liu, Haifeng Wang 0001. 12230-12250 [doi]
- Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference AlignmentXueyao Zhang, Yuancheng Wang, Chaoren Wang, Ziniu Li, Zhuo Chen 0006, Zhizheng Wu 0001. 12251-12270 [doi]
- GiFT: Gibbs Fine-Tuning for Code GenerationHaochen Li 0009, Wanjin Feng, Xin Zhou 0008, Zhiqi Shen 0001. 12271-12284 [doi]
- Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck ModelsYiwen Jiang, Deval Mehta, Wei Feng 0015, ZongYuan Ge. 12285-12297 [doi]
- Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal PredictionXiaowei Zhu, Yubing Ren, Yanan Cao, Xixun Lin, Fang Fang, Yangxi Li. 12298-12319 [doi]
- RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge GraphJunsik Kim, Jinwook Park, Kangil Kim. 12320-12336 [doi]
- RolePlot: A Systematic Framework for Evaluating and Enhancing the Plot-Progression Capabilities of Role-Playing AgentsPinyi Zhang, Siyu An, Lingfeng Qiao, Yifei Yu, Jingyang Chen, Jie Wang, Di Yin, Xing Sun, Kai Zhang. 12337-12354 [doi]
- TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchZhenyu Hou, Ziniu Hu, Yujiang Li, Rui Lu, Jie Tang 0001, Yuxiao Dong. 12355-12369 [doi]
- Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language ModelEmre Can Acikgoz, Jeremiah Greer, Akul Datta, Ze Yang, William Zeng, Oussama Elachqar, Emmanouil Koukoumidis, Dilek Hakkani-Tür, Gokhan Tur. 12370-12390 [doi]
- Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine TranslationYupu Liang, Yaping Zhang, Zhiyang Zhang, Yang Zhao 0007, Lu Xiang, Chengqing Zong, Yu Zhou 0001. 12391-12408 [doi]
- SDPO: Segment-Level Direct Preference Optimization for Social AgentsAobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu, Ke Wang, Xiaoqian Liu, Qicheng Li, Yong Qin, Fei Huang 0002. 12409-12423 [doi]
- KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained CounselorsZhiyang Qi, Takumasa Kaneko, Keiko Takamizo, Mariko Ukiyo, Michimasa Inaba. 12424-12443 [doi]
- SURVEYFORGE : On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey WritingXiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang 0065, Lei Bai 0001, Bo Zhang 0069. 12444-12465 [doi]
- Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum LearningYexing Du, Youcheng Pan, Ziyang Ma 0001, Bo Yang 0006, Yifan Yang 0005, Keqi Deng, Xie Chen 0001, Yang Xiang, Ming Liu 0004, Bing Qin 0001. 12466-12478 [doi]
- AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific ResearchYilun Zhao 0001, Weiyuan Chen, Zhijian Xu, Manasi Patwardhan 0001, Chengye Wang, Yixin Liu 0003, Lovekesh Vig, Arman Cohan. 12479-12491 [doi]
- Redundancy Principles for MLLMs BenchmarksZicheng Zhang, Xiangyu Zhao, XinYu Fang, Chunyi Li, Xiaohong Liu 0001, Xiongkuo Min, Haodong Duan, Kai Chen 0026, Guangtao Zhai. 12492-12504 [doi]
- WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue ModelsYifu Chen, Shengpeng Ji, Haoxiao Wang, Ziqing Wang, Siyu Chen, Jinzheng He, Jin Xu, Zhou Zhao. 12505-12523 [doi]
- ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5Jiaming Zhou, Shiyao Wang, Shiwan Zhao, Jiabei He 0001, Haoqin Sun, Hui Wang 0075, Cheng Liu, Aobo Kong, Yujie Guo, Xi Yang 0023, Yequan Wang, Yonghua Lin, Yong Qin. 12524-12537 [doi]
- Finding the Sweet Spot: Preference Data Construction for Scaling Preference OptimizationYao Xiao, Hai Ye, Linyao Chen, Hwee Tou Ng, Lidong Bing, Xiaoli Li, Roy Ka-Wei Lee. 12538-12552 [doi]
- Enhancing Safe and Controllable Protein Generation via Knowledge Preference OptimizationYuhao Wang, Keyan Ding, Kehua Feng, Zeyuan Wang, Ming Qin, Xiaotong Li, Qiang Zhang 0026, Huajun Chen. 12553-12569 [doi]
- SINCon: Mitigate LLM-Generated Malicious Message Injection Attack for Rumor DetectionMingqing Zhang, Qiang Liu, Xiang Tao, Shu Wu, Liang Wang. 12570-12581 [doi]
- Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language ModelsJungwoo Park, Taewhoo Lee, Chanwoong Yoon, Hyeon Hwang, Jaewoo Kang. 12582-12600 [doi]
- Agentic Knowledgeable Self-awarenessShuofei Qiao, Zhisong Qiu, Baochang Ren, XiaoBin Wang, Xiangyuan Ru, Ningyu Zhang 0001, Xiang Chen 0016, Yong Jiang 0001, Pengjun Xie, Fei Huang 0002, Huajun Chen. 12601-12625 [doi]
- A Unified Agentic Framework for Evaluating Conditional Image GenerationJifang Wang, Yangxue Yangxue, Longyue Wang, Zhenran Xu, Yiyu Wang, Yaowei Wang 0001, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang 0005. 12626-12646 [doi]
- Planning-Driven Programming: A Large Language Model Programming WorkflowChao Lei, Yanchuan Chang, Nir Lipovetzky, Krista A. Ehinger. 12647-12684 [doi]
- Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question AnsweringYuan Sui, Yufei He, Zifeng Ding, Bryan Hooi. 12685-12701 [doi]
- Nudging: Inference-time Alignment of LLMs via Guided DecodingYu Fei, Yasaman Razeghi, Sameer Singh 0001. 12702-12739 [doi]
- Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive ParaphrasingZhilin Wang, Yafu Li, Jianhao Yan, Yu Cheng 0001, Yue Zhang 0004. 12740-12755 [doi]
- SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language ModelsZhuang Li 0001, Yuncheng Hua, Thuy-Trang Vu, Haolan Zhan, Lizhen Qu, Gholamreza Haffari. 12756-12790 [doi]
- HFT: Half Fine-Tuning for Large Language ModelsTingfeng Hui, Zhenyu Zhang 0006, Shuohuan Wang, Weiran Xu, Yu Sun 0029, Hua Wu 0003. 12791-12819 [doi]
- Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense DiagnosisHuijun Lian, Zekai Sun, Keqi Chen, Yingming Gao, Ya Li 0001. 12820-12835 [doi]
- From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question GenerationCheng Cheng, Zhenya Huang, Guanhao Zhao, Yuxiang Guo 0002, Xin Lin 0005, Jinze Wu, Xin Li 0064, Shijin Wang 0001. 12836-12856 [doi]
- RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-ThoughtsMingyan Wu, Zhenghao Liu 0001, Yukun Yan, Xinze Li, Shi Yu 0001, Zheni Zeng, Yu Gu 0002, Ge Yu 0001. 12857-12874 [doi]
- Lost in Literalism: How Supervised Training Shapes Translationese in LLMsYafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang 0004. 12875-12894 [doi]
- Accurate KV Cache Quantization with Outlier Tokens TracingYi Su 0006, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li 0016, Xinyu Duan, Zhefeng Wang 0001, Min Zhang 0005. 12895-12915 [doi]
- Can Large Language Models Understand Internet Buzzwords Through User-Generated ContentChen Huang 0006, Junkai Luo, Xinzuo Wang, Wenqiang Lei, Jiancheng Lv 0001. 12916-12941 [doi]
- EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsYuanteng Chen, Yuantian Shao, Peisong Wang, Jian Cheng 0001. 12942-12963 [doi]
- Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State InterventionJingran Su, Jingfan Chen, Hongxin Li, YunTao Chen, Li Qing, Zhaoxiang Zhang 0005. 12964-12974 [doi]
- Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language ModelsFangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu 0002, Yu Qiao 0001, Zhiyong Wu 0003. 12975-12993 [doi]
- Improving Medical Large Vision-Language Models with Abnormal-Aware FeedbackYucheng Zhou 0001, Lingran Song, Jianbing Shen. 12994-13011 [doi]
- Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter MergingTingfeng Hui, Zhenyu Zhang 0006, Shuohuan Wang, Yu Sun 0029, Hua Wu 0003, Sen Su. 13012-13031 [doi]
- MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language NavigationLingFeng Zhang, Xiaoshuai Hao, Qinwen Xu, Qiang Zhang, Xinyao Zhang, Pengwei Wang, Jing Zhang, Zhongyuan Wang, Shanghang Zhang, Renjing Xu. 13032-13056 [doi]
- Exploring Compositional Generalization of Multimodal LLMs for Medical ImagingZhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang. 13057-13079 [doi]
- CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention InterventionZekai Ye, Qiming Li, Xiaocheng Feng, Libo Qin 0001, Yichong Huang, Baohang Li, Kui Jiang, Yang Xiang, Zhirui Zhang, Yunfei Lu, Duyu Tang, Dandan Tu, Bing Qin 0001. 13080-13094 [doi]
- Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree BranchingXiangci Li, Zhiyu Chen 0001, Jason Ingyu Choi, Nikhita Vedula, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi. 13095-13120 [doi]
- Qwen2.5-xCoder: Multi-Agent Collaboration for Multilingual Code Instruction TuningJian Yang 0003, Wei Zhang, Yibo Miao, Shanghaoran Quan, Zhenhe Wu, Qiyao Peng 0006, Liqun Yang, Tianyu Liu 0001, Zeyu Cui, Binyuan Hui, Junyang Lin. 13121-13131 [doi]
- Cultivating Gaming Sense for Yourself: Making VLMs Gaming ExpertsWenxuan Lu, Jiangyang He, Zhanqiu Zhang, Steven Y. Guo, Tianning Zang. 13132-13152 [doi]
- Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced ReasoningFangzhi Xu, Hang Yan 0010, Chang Ma, Haiteng Zhao, Qiushi Sun, Kanzhi Cheng, Junxian He, Jun Liu 0002, Zhiyong Wu 0003. 13153-13167 [doi]
- Extending Complex Logical Queries on Uncertain Knowledge GraphsWeizhi Fei, Zihao Wang 0001, Hang Yin 0008, Yang Duan, Yangqiu Song. 13168-13193 [doi]
- Knowledge Decoupling via Orthogonal Projection for Lifelong Editing of Large Language ModelsHaoyu Xu, Pengxiang Lan, Enneng Yang, Guibing Guo, Jianzhe Zhao, Linying Jiang, Xingwei Wang 0001. 13194-13213 [doi]
- φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and ExploitationFangzhi Xu, Hang Yan 0010, Chang Ma, Haiteng Zhao, Jun Liu 0002, Qika Lin, Zhiyong Wu 0003. 13214-13227 [doi]
- Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?Leyi Pan, Aiwei Liu, Shiyu Huang 0001, Yijian Lu, Xuming Hu, Lijie Wen 0001, Irwin King, Philip S. Yu. 13228-13251 [doi]
- Rethinking Reward Model Evaluation Through the Lens of Reward OveroptimizationSunghwan Kim, Dongjin Kang, Taeyoon Kwon, Hyungjoo Chae, Dongha Lee 0003, Jinyoung Yeo. 13252-13280 [doi]
- Inducing lexicons of in-group language with socio-temporal contextChristine de Kock. 13281-13291 [doi]
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech EnhancementBoyi Kang, Xinfa Zhu, Zihan Zhang, Zhen Ye, Mingshuai Liu, Ziqian Wang, Yike Zhu, Guobin Ma, Jun Chen, Longshuai Xiao, Chao Weng, Wei Xue, Lei Xie. 13292-13305 [doi]
- MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context InferenceKunxi Li, Zhonghua Jiang 0007, Zhouzhou Shen, Zhaode Wang, Chengfei Lv, Shengyu Zhang 0001, Fan Wu 0006, Fei Wu 0001. 13306-13318 [doi]
- Efficient OpAmp Adaptation for Zoom Attention to Golden ContextsHaoyuan Wu, Rui Ming, Haisheng Zheng, Zhuolun He, Bei Yu 0001. 13319-13331 [doi]
- Language-Codec: Bridging Discrete Codec Representations and Speech Language ModelsShengpeng Ji, Minghui Fang 0002, Jialong Zuo, Ziyue Jiang 0004, Dingdong Wang, Hanting Wang, Hai Huang 0013, Zhou Zhao 0001. 13332-13345 [doi]
- Adaptive Tool Use in Large Language Models with Meta-Cognition TriggerWenjun Li, Dexun Li, Kuicai Dong, Cong Zhang, Hao Zhang 0048, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Liu 0020. 13346-13370 [doi]
- MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkQiHao Zhao, Yangyu Huang, Tengchao Lv, Lei Cui 0001, Qinzheng Sun, Shaoguang Mao, Xin Zhang, Ying Xin, Qiufeng Yin, Scarlett Li, Furu Wei. 13371-13391 [doi]
- Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual UnderstandingHaneul Yoo, Yongjin Yang, Hwaran Lee. 13392-13413 [doi]
- Unleashing LLM Reasoning Capability via Scalable Question Synthesis from ScratchYuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Zhaopeng Tu, Qiaoming Zhu, Min Zhang 0005. 13414-13438 [doi]
- DREsS: Dataset for Rubric-based Essay Scoring on EFL WritingHaneul Yoo, Jieun Han, So-Yeon Ahn, Alice Oh. 13439-13454 [doi]
- PQR: Improving Dense Retrieval via Potential Query ModelingJunfeng Kang, Rui Li 0093, Qi Liu 0003, Yanjiang Chen, Zheng Zhang 0048, Junzhe Jiang 0001, Heng Yu, Yu Su 0002. 13455-13469 [doi]
- Cross-Lingual Generalization and Compression: From Language-Specific to Shared NeuronsFrederick Riemenschneider, Anette Frank. 13470-13491 [doi]
- SDBench: A Survey-based Domain-specific LLM Benchmarking and Optimization FrameworkCheng Guo, Hu Kai, Shuxian Liang, Yiyang Jiang, Yi Gao 0001, Xian-Sheng Hua 0001, Wei Dong 0001. 13492-13506 [doi]
- ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical AgentsYusheng Liao, Shuyang Jiang, Yanfeng Wang 0001, Yu Wang 0027. 13507-13531 [doi]
- Lexical Recall or Logical Reasoning: Probing the Limits of Reasoning Abilities in Large Language ModelsHenrike Beyer, Chris Reed 0001. 13532-13557 [doi]
- ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided ChainsZilu Dong, Xiangqing Shen, Zinong Yang, Rui Xia. 13558-13571 [doi]
- HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language ModelHaiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu. 13572-13586 [doi]
- Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language ModelsQika Lin, Tianzhe Zhao, Kai He 0001, Zhen Peng 0005, Fangzhi Xu, Ling Huang 0003, Jingying Ma, Mengling Feng. 13587-13602 [doi]
- Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State TrackingYifan Zhang, Wenyu Du, Dongming Jin, Jie Fu 0001, Zhi Jin. 13603-13621 [doi]
- TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and CompetitionTianwei Lin, Jiang Liu, Wenqiao Zhang, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li 0006, Jiannan Guo 0003, Hao Jiang 0014, Siliang Tang, Yueting Zhuang. 13622-13637 [doi]
- CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language ModelsLing Shi, Deyi Xiong. 13638-13659 [doi]
- STUN: Structured-Then-Unstructured Pruning for Scalable MoE PruningJaeseong Lee 0002, Seung-won Hwang, Aurick Qiao, Daniel F. Campos, Zhewei Yao, Yuxiong He. 13660-13676 [doi]
- Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks in LLM Tool-Learning SystemZiyou Jiang, Mingyang Li, Guowei Yang 0001, Junjie Wang 0001, Yuekai Huang, Zhiyuan Chang, Qing Wang 0001. 13677-13693 [doi]
- FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio GenerationHuadai Liu, Jialei Wang, Rongjie Huang 0001, Yang Liu 0278, Heng Lu, Zhou Zhao 0001, Wei Xue. 13694-13710 [doi]
- How does Misinformation Affect Large Language Model Behaviors and Preferences?Miao Peng, Nuo Chen, Jianheng Tang, Jia Li. 13711-13748 [doi]
- YESciEval: Robust LLM-as-a-Judge for Scientific Question AnsweringJennifer D'Souza 0001, Hamed Babaei Giglou, Quentin Münch. 13749-13783 [doi]
- GALLa: Graph Aligned Large Language Models for Improved Source Code UnderstandingZiyin Zhang, Hang Yu 0002, Sage Lee, Peng Di, Jianguo Li, Rui Wang 0015. 13784-13802 [doi]
- MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential DiagnosisDaniel Philip Rose, Chia-Chien Hung, Marco Lepri, Israa Alqassem, Kiril Gashteovski, Carolin Lawrence. 13803-13826 [doi]
- A Training-free LLM-based Approach to General Chinese Character Error CorrectionHouquan Zhou 0001, Bo Zhang 0071, Zhenghua Li, Ming Yan 0008, Min Zhang 0005. 13827-13852 [doi]
- HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language ModelsSongtao Jiang, Yan Zhang 0004, Yeying Jin, Zhihang Tang, Yangyang Wu, Yang Feng 0011, Jian Wu 0001, Zuozhu Liu. 13853-13868 [doi]
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at ScaleJiawei Guo, Tianyu Zheng, Yizhi Li, Yuelin Bai, Bo Li, Yubo Wang, King Zhu, Graham Neubig, Wenhu Chen, Xiang Yue. 13869-13920 [doi]
- SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-TuningPrabhat Pandey, Rupak Vignesh Swaminathan, K. V. Vijay Girish, Arunasish Sen, Jian Xie, Grant P. Strimel, Andreas Schwarz. 13921-13942 [doi]
- Recent Advances in Speech Language Models: A SurveyWenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Steven Y. Guo, Irwin King. 13943-13970 [doi]
- LexCLiPR: Cross-Lingual Paragraph Retrieval from Legal JudgmentsRohit Upadhya, T. Y. S. S. Santosh. 13971-13993 [doi]
- Multi-task Adversarial Attacks against Black-box Model with Few-shot QueriesWenqiang Wang, Yan Xiao 0002, Hao Lin, Yangshijie Zhang, Xiaochun Cao. 13994-14014 [doi]
- SPECTRA: Faster Large Language Model Inference with Optimized Internal and External SpeculationNguyen-Khang Le, Truong Dinh Do, Le Minh Nguyen. 14015-14034 [doi]
- Multi-level Association Refinement Network for Dialogue Aspect-based Sentiment Quadruple AnalysisZeliang Tong, Wei Wei 0002, Xiaoye Qu, Rikui Huang, Zhixin Chen, Xingyu Yan. 14035-14057 [doi]
- Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power of LLMsQiwen Wang, Junqi Yang, Zhenghao Lin, Zhenzhe Ying, Weiqiang Wang, Chen Lin. 14058-14078 [doi]
- Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory PerspectiveXiaoye Qu, Zengqi Yu, Dongrui Liu, Wei Wei 0002, Daizong Liu, Jianfeng Dong, Yu Cheng 0001. 14079-14099 [doi]
- MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought VerificationLinzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Tianpeng Li, Fan Yang 0024, Zenan Zhou, Wentao Zhang 0001. 14100-14115 [doi]
- Graph-Structured Trajectory Extraction from TraveloguesAitaro Yamamoto, Hiroyuki Otomo, Hiroki Ouchi, Shohei Higashiyama, Hiroki Teranishi, Hiroyuki Shindo, Taro Watanabe. 14116-14132 [doi]
- Learning First-Order Logic Rules for Argumentation MiningYang Sun, Guanrong Chen, Hamid Alinejad-Rokny, Jianzhu Bao, Yuqi Huang, Bin Liang 0004, Kam-Fai Wong, Min Yang 0007, Ruifeng Xu 0001. 14133-14148 [doi]
- Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal InconsistencyJiafeng Liang, Shixin Jiang, Xuan Dong, Ning Wang, Zheng Chu, Hui Su, JinLan Fu, Ming Liu 0004, See-Kiong Ng, Bing Qin 0001. 14149-14162 [doi]
- UniRAG: Unified Query Understanding Method for Retrieval Augmented GenerationRui Li, Liyang He, Qi Liu, Zheng Zhang, Heng Yu, Yuyang Ye, Linbo Zhu, Yu Su. 14163-14178 [doi]
- Contextual Experience Replay for Self-Improvement of Language AgentsYitao Liu, Chenglei Si, Karthik R. Narasimhan, Shunyu Yao. 14179-14198 [doi]
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningQi Sun, Pengfei Hong, Pala Tej Deep, Vernon Toh, U-Xuan Tan, Deepanway Ghosal, Soujanya Poria. 14199-14214 [doi]
- Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and MethodYupei Ren, Xinyi Zhou, Ning Zhang, Shangqing Zhao, Man Lan, Xiaopeng Bai. 14215-14231 [doi]
- Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow ThinkingHaohao Luo, Jiayi Kuang, Wei Liu 0005, Ying Shen, Jian Luan 0001, Yang Deng. 14232-14251 [doi]
- MaXIFE: Multilingual and Cross-lingual Instruction Following EvaluationYile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, ChangJing ChangJing, Liang Li. 14252-14332 [doi]
- Linguistic Generalizability of Test-Time Scaling in Mathematical ReasoningGuijin Son, Jiwoo Hong, Hyunwoo Ko, James Thorne. 14333-14368 [doi]
- Can MLLMs Understand the Deep Implication Behind Chinese Images?Chenhao Zhang 0005, Xi Feng, Yuelin Bai, Xeron Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu, Xingwei Qu, Yifei Zhang, Qixuan Zhao, Yiming Liang, Ziqiang Liu, Feiteng Fang, Min Yang 0007, Wenhao Huang, Chenghua Lin, Ge Zhang 0009, Shiwen Ni. 14369-14402 [doi]
- KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of KazakhstanMukhammed Togmanov, Nurdaulet Mukhituly, Diana Turmakhan, Jonibek Mansurov, Maiya Goloburda, Akhmed Sakip, Zhuohan Xie, Yuxia Wang, Bekassyl Syzdykov, Nurkhan Laiyk, Alham Fikri Aji, Ekaterina Kochmar, Preslav Nakov, Fajri Koto. 14403-14416 [doi]
- Towards Multi-dimensional Evaluation of LLM Summarization across Domains and LanguagesHyangsuk Min, Yuho Lee, Minjeong Ban, Jiaqi Deng, Nicole Hee-Yeon Kim, Taewon Yun, Hang Su, Jason Cai, Hwanjun Song. 14417-14450 [doi]
- ClusterAttn: KV Cache Compression under Intrinsic Attention ClusteringMinwei Zhang, Haifeng Sun 0001, Jingyu Wang, Shaolong Li, Wanyi Ning, Qi Qi 0001, Zirui Zhuang, Jianxin Liao. 14451-14473 [doi]
- SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie ScriptEunwon Kim, Chanho Park, Buru Chang. 14474-14498 [doi]
- Incongruity-aware Tension Field Network for Multi-modal Sarcasm DetectionJiecheng Zhang, C. L. Philip Chen, Shuzhen Li, Tong Zhang 0015. 14499-14508 [doi]
- Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in KazakhNurkhan Laiyk, Daniil Orel, Rituraj Joshi, Maiya Goloburda, Yuxia Wang, Preslav Nakov, Fajri Koto. 14509-14538 [doi]
- Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion AttackChenxi Dai, Lin Lu, Pan Zhou 0001. 14539-14551 [doi]
- From Selection to Generation: A Survey of LLM-based Active LearningYu Xia 0007, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li 0001, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen 0003, Franck Dernoncourt, Branislav Kveton, Tong Yu 0001, Ruiyi Zhang 0002, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang 0160, Xiang Chen, Hanieh Deilamsalehy, SungChul Kim, Zhengmian Hu, Yue Zhao 0016, Nedim Lipka, Seunghyun Yoon 0002, Ting-Hao Kenneth Huang, Zichao Wang 0001, Puneet Mathur, Soumyabrata Pal, Koyel Mukherjee, Zhehao Zhang, Namyong Park, Thien Huu Nguyen, Jiebo Luo 0001, Ryan A. Rossi, Julian J. McAuley. 14552-14569 [doi]
- OmniFlatten: An End-to-end GPT Model for Seamless Voice ConversationQinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chao-Hong Tan, Zhihao Du, Shiliang Zhang. 14570-14580 [doi]
- DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-TuningDohoon Kim 0002, Donghun Kang, Taesup Moon. 14581-14602 [doi]
- EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language ModelMeidan Ding, Jipeng Zhang, Wenxuan Wang 0001, Haiqin Zhong, Xiaoqin Wang, Xinheng Lyu, Wenting Chen, LinLin Shen. 14603-14619 [doi]
- CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context DemonstrationsVignesh Kothapalli, Hamed Firooz, Maziar Sanjabi. 14620-14642 [doi]
- Flexora: Flexible Low-Rank Adaptation for Large Language ModelsChenxing Wei, Yao Shu, Ying Tiffany He, Fei Yu. 14643-14682 [doi]
- QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance of LLMsLei Wang, Ruobing Zuo, Gaolei He, Jianlin Wang, Zhengfeng Yang. 14683-14698 [doi]
- RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-ThoughtYi Lu, Jiawang Cao, Yongliang Wu, Bozheng Li, Licheng Tang, Yangguang Ji, Chong Wu, Jay Wu, Wenbo Zhu. 14699-14716 [doi]
- QAEval: Mixture of Evaluators for Question-Answering Task EvaluationTan Yue, Rui Mao 0010, Xuzhao Shi, Shuo Zhan, Zuhao Yang, Dongyan Zhao 0001. 14717-14730 [doi]
- Debiasing the Fine-Grained Classification Task in LLMs with Bias-Aware PEFTDaiying Zhao, Xinyu Yang, Hang Chen. 14731-14746 [doi]
- Demystifying Small Language Models for Edge DeploymentZhenyan Lu, Xiang Li 0067, Dongqi Cai 0001, Rongjie Yi, Fangming Liu, Wei Liu 0061, Jian Luan, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu. 14747-14764 [doi]
- Adapt Once, Thrive with Updates: Transferable Parameter-Efficient Fine-Tuning on Evolving Base ModelsNaibin Gu, Peng Fu 0008, Xiyu Liu 0003, Ke Ma, Zheng Lin 0001, Weiping Wang 0005. 14765-14783 [doi]
- Can Vision-Language Models Evaluate Handwritten Math?Oikantik Nath, Hanani Bathina, Mohammed Safi Ur Rahman Khan, Mitesh M. Khapra. 14784-14814 [doi]
- Continual Gradient Low-Rank Projection Fine-Tuning for LLMsChenxu Wang, Yilin Lyu, Zicheng Sun, Liping Jing. 14815-14829 [doi]
- Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration?Ziming Wang, Zeyu Shi, Haoyi Zhou, Shiqi Gao, Qingyun Sun, Jianxin Li 0002. 14830-14853 [doi]
- Towards Robust ESG Analysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category GeneralizationKeane Ong, Rui Mao 0010, Deeksha Varshney, Erik Cambria, Gianmarco Mengaldo. 14854-14879 [doi]
- HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden StatesYilei Jiang, Xinyan Gao, Tianshuo Peng, Yingshui Tan, Xiaoyong Zhu, Bo Zheng 0007, Xiangyu Yue 0001. 14880-14893 [doi]
- SwiLTra-Bench: The Swiss Legal Translation BenchmarkJoel Niklaus, Jakob Merane, Luka Nenadic, Sina Ahmadi, Yingqiang Gao, Cyrill A. H. Chevalley, Claude Humbel, Christophe Gösken, Lorenzo Tanzi, Thomas Lüthi, Stefan Palombo, Spencer Poff, Boling Yang, Nan Wu, Matthew Guillod, Robin Mamié, Daniel Brunner, Julio Pereyra, Niko Grupen. 14894-14916 [doi]
- Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation RefinementYichen Dong, Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang. 14917-14933 [doi]
- Circuit Compositions: Exploring Modular Structures in Transformer-Based Language ModelsPhilipp Mondorf, Sondre Wold, Barbara Plank. 14934-14955 [doi]
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political QuestionsClara Lachenmaier, Judith Sieker, Sina Zarrieß. 14956-14975 [doi]
- GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-CheckingYingjian Chen, Haoran Liu, Yinhong Liu, Jinxiang Xie, Rui Yang, Han Yuan, Yanran Fu, Peng Yuan Zhou, Qingyu Chen 0001, James Caverlee, Irene Li. 14976-14995 [doi]
- SCULPT: Systematic Tuning of Long PromptsShanu Kumar, Akhila Yesantarao Venkata, Shubhanshu Khandelwal, Bishal Santra, Parag Agrawal, Manish Gupta 0001. 14996-15029 [doi]
- Crab: A Novel Configurable Role-Playing LLM with Assessing BenchmarkKai He 0001, Yucheng Huang, Wenqing Wang, Delong Ran, Dongming Sheng, Junxuan Huang, Qika Lin, Jiaxing Xu, Wenqiang Liu, Mengling Feng. 15030-15052 [doi]
- Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language ModelsYingshui Tan, Boren Zheng, Baihui Zheng, Kerui Cao, Huiyun Jing, Jincheng Wei, Jiaheng Liu, Yancheng He, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang. 15053-15076 [doi]
- TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data SynthesisXiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xuanhong Li, Chong Teng, Donghong Ji, Zhuang Li. 15077-15099 [doi]
- Cross-Lingual Optimization for Language Transfer in Large Language ModelsJungseob Lee, Seongtae Hong, Hyeonseok Moon, HeuiSeok Lim. 15100-15119 [doi]
- CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic ModelingMinghui Fang 0002, Shengpeng Ji, Jialong Zuo, Hai Huang 0013, Yan Xia 0006, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu 0003, Gang Wang, Zhenhua Dong, Zhou Zhao 0001. 15120-15133 [doi]
- MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding BenchmarkXiang Yue, Tianyu Zheng, Yuansheng Ni, Yubo Wang, Kai Zhang 0033, Shengbang Tong, Yuxuan Sun 0002, Botao Yu, Ge Zhang 0009, Huan Sun 0001, Yu Su 0001, Wenhu Chen, Graham Neubig. 15134-15186 [doi]
- Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from ScratchXueru Wen, Jie Lou, Zichao Li, Yaojie Lu 0001, Xingyu, Yuqiu Ji, Guohai Xu, Hongyu Lin, Ben He, Xianpei Han, Le Sun 0001, Debing Zhang. 15187-15211 [doi]
- Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template RegionChak Tou Leong, Qingyu Yin, Jian Wang 0054, Wenjie Li 0002. 15212-15229 [doi]
- LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-SteeringJinhe Bi, Yujun Wang, Haokun Chen, Xun Xiao, Artur Hecker, Volker Tresp, Yunpu Ma. 15230-15250 [doi]
- Efficient Long Context Language Model Retrieval with CompressionMinju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang. 15251-15268 [doi]
- Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question AnsweringRunxuan Liu, Luobei Luobei, Jiaqi Li 0004, Baoxin Wang, Ming Liu 0007, Dayong Wu, Shijin Wang 0001, Bing Qin 0001. 15269-15284 [doi]
- Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical ApplicationsZhe Chen 0024, Yusheng Liao, Shuyang Jiang, Pingjie Wang, Yiqiu Guo, Yanfeng Wang 0001, Yu Wang 0027. 15285-15309 [doi]
- Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual SignalsYuxin Lin, Yinglin Zheng, Ming Zeng 0008, Wangzheng Shi. 15310-15322 [doi]
- A New Formulation of Zipf's Meaning-Frequency Law through Contextual DiversityRyo Nagata, Kumiko Tanaka-Ishii. 15323-15335 [doi]
- The Mirage of Model Editing: Revisiting Evaluation in the WildWanli Yang, Fei Sun 0001, Jiajun Tan, Xinyu Ma, Qi Cao, Dawei Yin, Huawei Shen, Xueqi Cheng. 15336-15354 [doi]
- LAQuer: Localized Attribution Queries in Content-grounded GenerationEran Hirsch, Aviv Slobodkin, David Wan, Elias Stengel-Eskin, Mohit Bansal, Ido Dagan. 15355-15370 [doi]
- EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement LearningXiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma, Aobo Kong, Fei Huang 0002, Jianbin Jiao, Junge Zhang. 15371-15396 [doi]
- DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link GraphJihyung Lee, Jin-Seop Lee, Jaehoon Lee, Yunseok Choi, Jee-Hyong Lee 0001. 15397-15412 [doi]
- PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR AccuracyShuhao Guan, Moule Lin, Cheng Xu 0006, Xinyi Liu, Jinman Zhao, Jiexin Fan, Qi Xu, Derek Greene. 15413-15425 [doi]
- Digest the Knowledge: Large Language Models empowered Message Passing for Knowledge Graph Question AnsweringJunhong Wan, Tao Yu, Kunyu Jiang, Yao Fu, Weihao Jiang, Jiang Zhu. 15426-15442 [doi]
- RecLM: Recommendation Instruction TuningYangqin Jiang, Yuhao Yang 0002, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang 0001. 15443-15459 [doi]
- DS²-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment AnalysisHongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu 0001. 15460-15478 [doi]
- MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and SummarizationHangChen HangChen, Chao-Han Huck Yang, Jia-Chen Gu, Sabato Marco Siniscalchi, Jun Du 0002. 15479-15492 [doi]
- Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale TuningSohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy. 15493-15512 [doi]
- MolRAG: Unlocking the Power of Large Language Models for Molecular Property PredictionZiting Xian, Jiawei Gu, Lingbo Li, Shangsong Liang. 15513-15531 [doi]
- SkillAggregation: Reference-free LLM-Dependent AggregationGuangzhi Sun, Anmol Kagrecha, Potsawee Manakul, Philip C. Woodland, Mark J. F. Gales. 15532-15548 [doi]
- MasRouter: Learning to Route LLMs for Multi-Agent SystemsYanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, Yiyan Qi. 15549-15572 [doi]
- Beyond Single Labels: Improving Conversational Recommendation through LLM-Powered Data AugmentationHaozhe Xu, Xiaohua Wang, Changze Lv, Xiaoqing Zheng. 15573-15590 [doi]
- Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient EvaluationPeiwen Yuan, Yueqi Zhang, Shaoxiong Feng, Yiwei Li 0001, Xinglin Wang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li 0001. 15591-15615 [doi]
- iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question AnsweringShuai Wang, Yinan Yu. 15616-15628 [doi]
- IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response TheoryWei Song 0010, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, Guanhao Zhao, Fei Wang 0063, Runze Wu 0001. 15629-15644 [doi]
- MLAS-LoRA: Language-Aware Parameters Detection and LoRA-Based Knowledge Transfer for Multilingual Machine TranslationTianyu Dong, Bo Li 0131, Jinsong Liu, ShaoLin Zhu, Deyi Xiong. 15645-15660 [doi]
- M2RC-EVAL: Massively Multilingual Repository-level Code Completion EvaluationJiaheng Liu, Ken Deng, Congnan Liu, Jian Yang 0030, Shukai Liu, He Zhu, Peng Zhao, Linzheng Chai, Yanan Wu, Jin Ke, Ge Zhang 0009, Zekun Moore Wang, Guoan Zhang, Yingshui Tan, Bangyu Xiang, Zhaoxiang Zhang 0001, Wenbo Su, Bo Zheng 0007. 15661-15684 [doi]
- Evaluating Design Decisions for Dual Encoder-based Entity DisambiguationSusanna Rücker, Alan Akbik. 15685-15701 [doi]
- How to Compare Things Properly? A Study of Argument Relevance in Comparative Question AnsweringIrina Nikishina, Saba Anwar, Nikolay Dolgov, Maria Manina, Daria Ignatenko, Artem Shelmanov, Chris Biemann. 15702-15720 [doi]
- FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingZichen Tang, Haihong E, Ziyan Ma, Haoyang He, Jiacheng Liu, Zhongjun Yang, Zihua Rong, Rongjin Li, Kun Ji, Qing Huang, Xinyang Hu, Yang Liu, Qianhe Zheng. 15721-15749 [doi]
- Controllable Style Arithmetic with Language ModelsWeiqi Wang 0001, Wengang Zhou 0001, Zongmeng Zhang, Jie Zhao, Houqiang Li. 15750-15799 [doi]
- Masks Can be Learned as an Alternative to ExpertsPeiyu Liu, Tianwen Wei, Bo Zhu, Xin Zhao, Shuicheng Yan. 15800-15811 [doi]
- Program Synthesis Benchmark for Visual Programming in XLogoOnline EnvironmentChao Wen, Jacqueline Staub, Adish Singla. 15812-15838 [doi]
- Removal of Hallucination on Hallucination: Debate-Augmented RAGWentao Hu, Wengyu Zhang, Yiyang Jiang, Chen Jason Zhang, Xiaoyong Wei, Qing Li 0001. 15839-15853 [doi]
- CodeDPO: Aligning Code Models with Self Generated and Verified Source CodeKechi Zhang, Ge Li 0001, Yihong Dong, Jingjing Xu, Jun Zhang, Jing Su, Yongfei Liu, Zhi Jin. 15854-15871 [doi]
- ProxAnn: Use-Oriented Evaluations of Topic Models and Document ClusteringAlexander Miserlis Hoyle, Lorena Calvo-Bartolomé, Jordan Lee Boyd-Graber, Philip Resnik. 15872-15897 [doi]
- BOOKWORLD: From Novels to Interactive Agent Societies for Story CreationYiting Ran, Xintao Wang 0001, Tian Qiu, Jiaqing Liang, Yanghua Xiao, Deqing Yang. 15898-15912 [doi]
- Quantifying Lexical Semantic Shift via Unbalanced Optimal TransportRyo Kishino, Hiroaki Yamagiwa, Ryo Nagata, Sho Yokoi, Hidetoshi Shimodaira. 15913-15933 [doi]
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward SystemsHao Peng 0015, Yunjia Qi, Xiaozhi Wang, Zijun Yao 0002, Bin Xu 0001, Lei Hou 0001, Juanzi Li. 15934-15949 [doi]
- Adaptive and Robust Translation from Natural Language to Multi-model Query LanguagesGengyuan Shi, Chaokun Wang, Yabin Liu, Jiawei Ren. 15950-15965 [doi]
- SAKE: Steering Activations for Knowledge EditingMarco Scialanga, Thibault Laugel, Vincent Grari, Marcin Detyniecki. 15966-15978 [doi]
- Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMsDanni Liu, Jan Niehues. 15979-15996 [doi]
- Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?Arduin Findeis, Floris Weers, Guoli Yin, Ke Ye, Ruoming Pang, Tom Gunter. 15997-16020 [doi]
- One for All: Update Parameterized Knowledge Across Multiple Models with Once EditWeitao Ma, Xiyuan Du, Xiaocheng Feng, Lei Huang 0021, Yichong Huang, Huiyi Zhang, Xiaoliang Yang, Baohang Li, Xiachong Feng, Ting Liu 0001, Bing Qin 0001. 16021-16034 [doi]
- VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a ServiceXiasi Wang, Tianliang Yao, Simin Chen, Runqi Wang, Lei Ye, Kuofeng Gao, Yi Huang, Yuan Yao. 16035-16050 [doi]
- The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMsNitay Calderon, Roi Reichart, Rotem Dror. 16051-16081 [doi]
- CrisisTS: Coupling Social Media Textual Data and Meteorological Time Series for Urgency ClassificationRomain Meunier, Farah Benamara, Véronique Moriceau, Zhongzheng Qiao, Savitha Ramasamy. 16082-16099 [doi]
- How to Mitigate Overfitting in Weak-to-strong Generalization?Junhao Shi, Qinyuan Cheng, Zhaoye Fei, Yining Zheng, Qipeng Guo, Xipeng Qiu. 16100-16118 [doi]
- Com² : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language ModelsKai Xiong 0002, Xiao Ding, Yixin Cao 0002, Yuxiong Yan, Li Du, Yufei Zhang, Jinglong Gao, Jiaqian Liu, Bing Qin 0001, Ting Liu 0001. 16119-16140 [doi]
- Dynamic Head Selection for Neural Lexicalized Constituency ParsingYang Hou 0001, Zhenghua Li. 16141-16155 [doi]
- My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion AnalysisJian Liao, Yu Feng, Yujin Zheng, Jun Zhao, Suge Wang, Jianxing Zheng. 16156-16172 [doi]
- EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness in LLMs on Evolving KnowledgeZhiyuan Zhu, Yusheng Liao, Zhe Chen 0024, Yuhao Wang, Yunfeng Guan 0001, Yanfeng Wang 0001, Yu Wang 0027. 16173-16188 [doi]
- Enabling LLM Knowledge Analysis via Extensive MaterializationYujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Simon Razniewski. 16189-16202 [doi]
- Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow MatchingJialong Zuo, Shengpeng Ji, Minghui Fang 0002, Mingze Li, Ziyue Jiang 0001, Xize Cheng, Xiaoda Yang, Feiyang Chen, Xinyu Duan, Zhou Zhao 0001. 16203-16217 [doi]
- Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMsJingcheng Niu, Xingdi Yuan, Tong Wang, Hamidreza Saghir, Amir H. Abdi. 16218-16239 [doi]
- CritiQ: Mining Data Quality Criteria from Human PreferencesHonglin Guo, Kai Lv 0001, Qipeng Guo, Tianyi Liang, Zhiheng Xi, Demin Song, Qiuyinzhe Zhang, Yu Sun 0031, Kai Chen 0026, Xipeng Qiu, Tao Gui. 16240-16261 [doi]
- Theoretical Guarantees for Minimum Bayes Risk DecodingYuki Ichihara, Yuu Jinnai, Kaito Ariu, Tetsuro Morimura, Eiji Uchibe. 16262-16284 [doi]
- Mutual-Taught for Co-adapting Policy and Reward ModelsTianyuan Shi, Canbin Huang, Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xiaojun Quan, Ming Yan 0008. 16285-16298 [doi]
- Enhancing Cross-Lingual Transfer through Reversible Transliteration: A Huffman-Based Approach for Low-Resource LanguagesWenhao Zhuang, Yuan Sun, Xiaobing Zhao. 16299-16313 [doi]
- Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language ModelsJiaxu Zhao 0002, Meng Fang, Kun Zhang, Mykola Pechenizkiy. 16314-16338 [doi]
- MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and BaselinesDávid Javorský, Ondrej Bojar, François Yvon. 16339-16356 [doi]
- BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context LearningErcong Nie, Bo Shao, Mingyang Wang 0003, Zifeng Ding, Helmut Schmid, Hinrich Schütze. 16357-16374 [doi]
- What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story EvaluationDingyi Yang, Qin Jin. 16375-16398 [doi]
- PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level AdaptationLinhai Zhang, Jialong Wu 0007, Deyu Zhou, Yulan He 0001. 16399-16411 [doi]
- Enhancing Event-centric News Cluster Summarization via Data Sharpening and Localization InsightsLongyin Zhang, Bowei Zou, AiTi Aw. 16412-16426 [doi]
- MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence CalibrationZhitao He 0001, Sandeep Polisetty, Zhiyuan Fan, Yuchen Huang, Shujin Wu, Yi R. Fung 0001. 16427-16444 [doi]
- LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context ScenariosXiaodong Wu, Minhao Wang, Yichen Liu, Xiaoming Shi, He Yan, Xiangju Li, JunMin Zhu, Wei Zhang 0056. 16445-16468 [doi]
- Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data FilteringShuzheng Si, Haozhe Zhao, Gang Chen 0039, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Kaikai An, Kangyang Luo, Chen Qian, Fanchao Qi, Baobao Chang, Maosong Sun 0001. 16469-16488 [doi]
- One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMsJunwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, Suhyun Kim. 16489-16507 [doi]
- RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning Based on Emotional InformationZhiwei Liu 0003, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard H. Hovy. 16508-16523 [doi]
- Task-Specific Information Decomposition for End-to-End Dense Video CaptioningZhiyue Liu, Xinru Zhang, Jinyuan Liu. 16524-16536 [doi]
- CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-JudgesHaitao Li 0006, Junjie Chen, Qingyao Ai, Zhumin Chu, Yujia Zhou 0002, Qian Dong, Yiqun Liu 0001. 16537-16552 [doi]
- Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism DetectionSahrish Khan, Arshad Jhumka, Gabriele Pergola. 16553-16571 [doi]
- Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language ModelsElena Sofia Ruzzetti, Giancarlo A. Xompero, Davide Venditti, Fabio Massimo Zanzotto. 16572-16592 [doi]
- PhysReason: A Comprehensive Benchmark towards Physics-Based ReasoningXinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu 0036. 16593-16615 [doi]
- Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific InformationYein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong, Jaewoo Kang. 16616-16643 [doi]
- Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-trainingZheheng Luo, Xin Zhang, Xiao Liu, Haoling Li, Yeyun Gong, Qi Chen, Peng Cheng. 16644-16656 [doi]
- Sheep's Skin, Wolf's Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?Jingjie Zeng, Liang Yang 0003, Zekun Wang, Yuanyuan Sun 0002, Hongfei Lin. 16657-16677 [doi]
- Neuron-Level Sequential Editing for Large Language ModelsHoucheng Jiang, Junfeng Fang, Tianyu Zhang, Baolong Bi, An Zhang 0003, Ruipeng Wang, Tao Liang, Xiang Wang 0010. 16678-16702 [doi]
- Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-ExpertsShengzhuang Chen, Ying Wei 0001, Jonathan Richard Schwarz. 16703-16717 [doi]
- SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationKeqi Deng, Wenxi Chen, Xie Chen 0001, Philip C. Woodland. 16718-16734 [doi]
- VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language ModelsWenqian Cui, Xiaoqi Jiao, Ziqiao Meng, Irwin King. 16735-16753 [doi]
- RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within GenerationXiaoxi Li, Jiajie Jin, Yujia Zhou 0002, Yongkang Wu, Zhonghua Li, Ye Qi, Zhicheng Dou. 16754-16779 [doi]
- The Role of Deductive and Inductive Reasoning in Large Language ModelsChengkun Cai, Xu Zhao, Haoliang Liu, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li 0050. 16780-16790 [doi]
- Disentangling the Roles of Representation and Selection in Data PruningYupei Du, Yingjin Song, Hugh Mee Wong, Daniil Ignatev, Albert Gatt, Dong Nguyen 0002. 16791-16809 [doi]
- FRACTAL: Fine-Grained Scoring from Aggregate Text LabelsYukti Makhija, Priyanka Agrawal, Rishi Saket, Aravindan Raghuveer. 16810-16830 [doi]
- ACT: Knowledgeable Agents to Design and Perform Complex TasksMakoto Nakatsuji, Shuhei Tateishi, Yasuhiro Fujiwara, Ayaka Matsumoto, Narichika Nomoto, Yoshihide Sato. 16831-16861 [doi]
- Logical forms complement probability in understanding language model (and human) performanceYixuan Wang, Freda Shi. 16862-16877 [doi]
- Length Controlled Generation for Black-box LLMsYuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu 0025, Lei Huang 0021, Ting Liu 0001, Bing Qin 0001, Tat-Seng Chua. 16878-16895 [doi]
- Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced OptimizationLei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yangfan Ye, Weihong Zhong, Yuxuan Gu, Baoxin Wang, Dayong Wu, Guoping Hu, Bing Qin. 16896-16913 [doi]
- Global Eye: Breaking the "Fixed Thinking Pattern" during the Instruction Expansion ProcessWenxuan Lu, Wei Liu, Jian Luan, Bin Wang, SongHao Jiang, Tianning Zang. 16914-16928 [doi]
- On Synthesizing Data for Context Attribution in Question AnsweringGorjan Radevski, Kiril Gashteovski, Shahbaz Syed, Christopher Malon, Sebastien Nicolas, Chia-Chien Hung, Timo Sztyler, Verena Heußer, Wiem Ben Rim, Masafumi Enomoto, Kunihiro Takeoka, Masafumi Oyamada, Goran Glavas, Carolin Lawrence. 16929-16950 [doi]
- TST: A Schema-Based Top-Down and Dynamic-Aware Agent of Text-to-Table TasksPeiwen Jiang, Haitong Jiang, Ruhui Ma, Yvonne Jie Chen, Jinhua Cheng. 16951-16966 [doi]
- EventRAG: Enhancing LLM Generation with Event Knowledge GraphsZairun Yang, Yilin Wang, Zhengyan Shi, Yuan Yao, Lei Liang 0002, Keyan Ding, Emine Yilmaz, Huajun Chen, Qiang Zhang 0026. 16967-16979 [doi]
- Analyzing the Rapid Generalization of SFT via the Perspective of Attention Head Activation PatternsYang Zhao 0023, Li Du, Xiao Ding, Kai Xiong 0002, Ting Liu 0001, Bing Qin 0001. 16980-16992 [doi]
- Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMsWenxuan Wang 0001, Xiaoyuan Liu, Kuiyi Gao, Jen-tse Huang 0001, Youliang Yuan, Pinjia He, Shuai Wang 0011, Zhaopeng Tu. 16993-17006 [doi]
- Mis-prompt: Benchmarking Large Language Models for Proactive Error HandlingJiayi Zeng, Yizhe Feng, Mengliang He, Wenhui Lei, Wei Zhang, Zeming Liu, Xiaoming Shi, Aimin Zhou. 17007-17034 [doi]
- TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel PlanningSoumyabrata Chaudhuri, Pranav Purkar, Ritwik Raghav, Shubhojit Mallick, Manish Gupta, Abhik Jana, Shreya Ghosh 0006. 17035-17064 [doi]
- DualGuard: A Parameter Space Transformation Approach for Bidirectional Defense in Split-Based LLM Fine-TuningZihan Liu, Yizhen Wang, Rui Wang, Sai Wu. 17065-17080 [doi]
- Movie101v2: Improved Movie Narration BenchmarkZihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin. 17081-17095 [doi]
- Can LLMs Evaluate Complex Attribution in QA? Automatic Benchmarking using Knowledge GraphsNan Hu, Jiaoyan Chen 0001, Yike Wu, Guilin Qi, Hongru Wang 0003, Sheng Bi, Yongrui Chen 0002, Tongtong Wu, Jeff Z. Pan. 17096-17118 [doi]
- Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid ItemsJongwook Han, Dongmin Choi, Woojung Song, Eun-Ju Lee, Yohan Jo. 17119-17159 [doi]
- FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature ImplementationWei Li 0232, Xin Zhang 0099, Zhongxin Guo, Shaoguang Mao, Wen Luo 0001, Guangyue Peng, Yangyu Huang, Houfeng Wang, Scarlett Li. 17160-17176 [doi]
- Do not Abstain! Identify and Solve the UncertaintyJingyu Liu, JingquanPeng JingquanPeng, Xiaopeng Wu, Xubin Li, Tiezheng Ge, Bo Zheng, Yong Liu. 17177-17197 [doi]
- Decoding by Contrasting Knowledge: Enhancing Large Language Model Confidence on Edited FactsBaolong Bi, Shenghua Liu, Lingrui Mei, Yiwei Wang 0001, Junfeng Fang, Pengliang Ji, Xueqi Cheng. 17198-17208 [doi]
- ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in VideosMohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Dr. Nagendra Kumar. 17209-17221 [doi]
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic AbstractionsLeonardo Ranaldi, Marco Valentino, André Freitas. 17222-17240 [doi]
- Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual SegmentsAniket Bhattacharyya, Anurag Tripathi, Ujjal Das, Archan Karmakar, Amit Pathak, Maneesh Gupta. 17241-17256 [doi]
- Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHubBohan Lyu 0001, Xin Cong, Heyang Yu, Pan Yang 0022, Cheng Qian 0008, Zihe Wang, Yujia Qin, Yining Ye, Yaxi Lu, Chen Qian, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu 0001, Maosong Sun 0001. 17257-17277 [doi]
- LLMs Can Simulate Standardized Patients via Agent CoevolutionZhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen 0001, Jian Wu 0001, Haolei Cai, Haochao Ying. 17278-17306 [doi]
- Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media PostsChristopher Bagdon, Aidan Combs, Carina Silberer, Roman Klinger. 17307-17330 [doi]
- Which Demographics do LLMs Default to During Annotation?Johannes Schäfer, Aidan Combs, Christopher Bagdon, Jiahui Li, Nadine Probol, Lynn Greschner, Sean Papay, Yarik Menchaca Resendiz, Aswathy Velutharambath, Amelie Wührl, Sabine Weber, Roman Klinger. 17331-17348 [doi]
- Can You Really Trust Code Copilot? Evaluating Large Language Models from a Code Security PerspectiveYutao Mou, Xiao Deng, Yuxiao Luo, Shikun Zhang, Wei Ye. 17349-17369 [doi]
- From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MarkerGenPeiwen Yuan, Chuyi Tan, Shaoxiong Feng, Yiwei Li 0001, Xinglin Wang, Yueqi Zhang, Jiayi Shi, Boyuan Pan, Yao Hu, Kan Li 0001. 17370-17390 [doi]
- AGD: Adversarial Game Defense Against Jailbreak Attacks in Large Language ModelsShilong Pan, Zhiliang Tian, Zhen Huang 0006, Wanlong Yu, Zhihua Wen, Xinwang Liu 0002, Kai Lu, Minlie Huang, Dongsheng Li 0001. 17391-17406 [doi]
- SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive ViewYongjie Xiao, Hongru Liang, Peixin Qin, Yao Zhang, Wenqiang Lei. 17407-17431 [doi]
- Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table ReasoningPeiying Yu, Guoxin Chen, Jingjing Wang. 17432-17451 [doi]
- An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)Laurie Burchell, Ona De Gibert Bonet, Nikolay Arefyev, Mikko Aulamo, Marta Bañón, Pinzhen Chen, Mariia Fedorova, Liane Guillou, Barry Haddow, Jan Hajic, Jindrich Helcl, Erik Henriksson, Mateusz Klimaszewski, Ville Komulainen, Andrey Kutuzov, Joona Kytöniemi, Veronika Laippala, Petter Mæhlum, Bhavitvya Malik, Farrokh Mehryary, Vladislav Mikhailov, Nikita Moghe, Amanda Myntti, Dayyán O'Brien, Stephan Oepen, Proyag Pal, Jousia Piha, Sampo Pyysalo, Gema Ramírez-Sánchez, David Samuel, Pavel Stepachev, Jörg Tiedemann, Dusan Varis, Tereza Vojtechová, Jaume Zaragoza-Bernabeu. 17452-17485 [doi]
- Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data GenerationYue Yang 0006, Ajay Patel, Matt Deitke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark. 17486-17505 [doi]
- Hierarchical Attention Generates Better ProofsJianlong Chen, Chao Li, Yang Yuan, Andrew C. Yao. 17506-17520 [doi]
- Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal AgentsTianyi Men, Zhuoran Jin, Pengfei Cao, Yubo Chen 0001, Kang Liu 0001, Jun Zhao 0001. 17521-17541 [doi]
- It's Not Bragging If You Can Back It Up: Can LLMs Understand Braggings?Jingjie Zeng, Huayang Li, Liang Yang, Yuanyuan Sun, Hongfei Lin. 17542-17560 [doi]
- A Troublemaker with Contagious Jailbreak Makes Chaos in Honest TownsTianyi Men, Pengfei Cao, Zhuoran Jin, Yubo Chen 0001, Kang Liu 0001, Jun Zhao 0001. 17561-17587 [doi]
- Meta-Learning Neural Mechanisms rather than Bayesian PriorsMichael Eric Goodale, Salvador Mascarenhas, Yair Lakretz. 17588-17605 [doi]
- Shifting from Ranking to Set Selection for Retrieval Augmented GenerationDahyun Lee, Yongrae Jo, Haeju Park, Moontae Lee. 17606-17619 [doi]
- Understanding Large Language Model Vulnerabilities to Social Bias AttacksJiaxu Zhao 0002, Meng Fang, Fanghua Ye 0001, Ke Xu, Qin Zhang, Joey Tianyi Zhou, Mykola Pechenizkiy. 17620-17636 [doi]
- ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue AgentsZhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong. 17637-17659 [doi]
- Pixel-Level Reasoning Segmentation via Multi-turn ConversationsDexian Cai, Xiaocui Yang, Yongkang Liu 0002, Daling Wang, Shi Feng 0001, Yifei Zhang 0003, Soujanya Poria. 17660-17679 [doi]
- Fixing Distribution Shifts of LLM Self-Critique via On-Policy Self-Play TrainingRong Bao, Donglei Yu, Kai Fan 0002, Minpeng Liao. 17680-17700 [doi]
- Inferring Functionality of Attention Heads from their ParametersAmit Elhelo, Mor Geva. 17701-17733 [doi]
- Faithful and Robust LLM-Driven Theorem Proving for NLI ExplanationsXin Quan, Marco Valentino, Louise A. Dennis, André Freitas. 17734-17755 [doi]
- Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial EditingJiakuan Xie, Pengfei Cao, Yubo Chen 0001, Kang Liu 0001, Jun Zhao 0001. 17756-17780 [doi]
- Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context PermutationWenyu Huang, Pavlos Vougiouklis, Mirella Lapata, Jeff Z. Pan. 17781-17795 [doi]
- From Human Reading to NLM Understanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based ModelsLuca Dini, Lucia Domenichelli, Dominique Brunato, Felice dell'Orletta. 17796-17813 [doi]
- Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question AnsweringLinhao Ye, Lang Yu, Zhikai Lei, Qin Chen 0001, Jie Zhou 0015, Liang He 0001. 17814-17824 [doi]
- Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMsXiaoyuan Liu, Wenxuan Wang 0001, Youliang Yuan, Jen-tse Huang 0001, Qiuzhi Liu, Pinjia He, Zhaopeng Tu. 17825-17846 [doi]
- SceneGenAgent: Precise Industrial Scene Generation with Coding AgentXiao Xia, Dan Zhang, Zibo Liao, Zhenyu Hou, Tianrui Sun, Jing Li, Ling Fu, Yuxiao Dong. 17847-17875 [doi]
- ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language ModelsHanxing Ding, Shuchang Tao, Liang Pang, Zihao Wei, Jinyang Gao, Bolin Ding, Huawei Shen, Xueqi Cheng. 17876-17891 [doi]
- Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case StudyBashar Alhafni, Nizar Habash. 17892-17914 [doi]
- From Isolates to Families: Using Neural Networks for Automated Language AffiliationFrederic Blum, Steffen Herbold, Johann-Mattis List. 17915-17927 [doi]
- ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language ModelsXuxu Liu, Siyuan Liang, Mengya Han, Yong Luo 0002, Aishan Liu, Xiantao Cai, Zheng He, Dacheng Tao. 17928-17947 [doi]
- Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-ExpertsXue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang 0001, Yufeng Chen 0005, Jinan Xu, Jie Zhou 0016. 17948-17963 [doi]
- When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue GenerationDaniela Occhipinti, Marco Guerini, Malvina Nissim. 17964-17985 [doi]
- ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMsZhenliang Zhang, Xinyu Hu 0001, Huixuan Zhang, Junzhe Zhang 0004, Xiaojun Wan 0001. 17986-18002 [doi]
- Revisit Self-Debugging with Self-Generated Tests for Code GenerationXiancai Chen, Zhengwei Tao, Kechi Zhang, Changzhi Zhou, Xinyu Zhang, Wanli Gu, Yuanpeng He, Mengdi Zhang, Xunliang Cai, Haiyan Zhao 0001, Zhi Jin. 18003-18023 [doi]
- InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-trainingDingdong Wang, Jin Xu, Ruihang Chu, Zhifang Guo, Xiong Wang, Jincenzi Wu, Dongchao Yang, Shengpeng Ji, Junyang Lin. 18024-18046 [doi]
- Exploring LLMs' Ability to Spontaneously and Conditionally Modify Moral Expressions through Text ManipulationCandida Maria Greco, Lucio La Cava, Lorenzo Zangari, Andrea Tagarelli. 18047-18070 [doi]
- Mixture of Ordered Scoring Experts for Cross-prompt Essay Trait ScoringPo-Kai Chen, Bo-Wei Tsai, Shao-Kuan Wei, Chien-Yao Wang, Jia-Ching Wang, Yi-Ting Huang. 18071-18084 [doi]
- Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsAnshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee 0004, Haejun Lee, Joohyung Lee. 18085-18108 [doi]
- Enhancing Spoken Discourse Modeling in Language Models Using Gestural CuesVarsha Suresh, Muhammad Hamza Mughal, Christian Theobalt, Vera Demberg. 18109-18123 [doi]
- ExploraCoder: Advancing Code Generation for Multiple Unseen APIs via Planning and Chained ExplorationYunkun Wang, Yue Zhang 0004, Zhen Qin 0004, Chen Zhi, Binhua Li, Fei Huang 0002, Yongbin Li, ShuiGuang Deng. 18124-18145 [doi]
- Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language ModelsZihong Zhang, Liqi He, Zuchao Li, Lefei Zhang, Hai Zhao 0001, Bo Du 0001. 18146-18163 [doi]
- RUBY: An Effective Framework for Multi-Constraint Multi-Hop Question GenerationWenzhuo Zhao, Shuangyin Li. 18164-18188 [doi]
- Can Indirect Prompt Injection Attacks Be Detected and Removed?Yulin Chen, Haoran Li 0003, Yuan Sui, Yufei He, Yue Liu 0008, Yangqiu Song, Bryan Hooi. 18189-18206 [doi]
- Identifying Open Challenges in Language IdentificationRob van der Goot. 18207-18227 [doi]
- The Distracting Effect: Understanding Irrelevant Passages in RAGChen Amiraz, Florin Cuconasu, Simone Filice, Zohar S. Karnin. 18228-18258 [doi]
- Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource LanguagesZeli Su, Ziyin Zhang, Guixian Xu, Jianing Liu, Xu Han, Ting Zhang, Yushuang Dong. 18259-18270 [doi]
- Graphically Speaking: Unmasking Abuse in Social Media with Conversation InsightsCélia Nouri, Chloé Clavel, Jean-Philippe Cointet. 18271-18286 [doi]
- CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process SupervisionYifei Lu, Fanghua Ye 0004, Jian Li, Qiang Gao, Cheng Liu, Haibo Luo, Nan Du, Xiaolong Li, Feiliang Ren. 18287-18304 [doi]
- RARE: Retrieval-Augmented Reasoning Enhancement for Large Language ModelsHieu Tran, Zonghai Yao, Zhichao Yang 0001, Junda Wang, Yifan Zhang, Shuo Han, Feiyun Ouyang, Hong Yu 0001. 18305-18330 [doi]
- Defense Against Prompt Injection Attack by Leveraging Attack TechniquesYulin Chen, Haoran Li 0003, Zihao Zheng, Dekai Wu, Yangqiu Song, Bryan Hooi. 18331-18347 [doi]
- Acquisition and Application of Novel Knowledge in Large Language ModelsZiyu Shang, Jianghan Liu, Zhizhao Luo, Peng Wang 0004, Wenjun Ke, Jiajun Liu, Zijie Xu 0003, Guozheng Li. 18348-18368 [doi]
- DNCASR: End-to-End Training for Speaker-Attributed ASRXianrui Zheng, Chao Zhang 0031, Philip C. Woodland. 18369-18383 [doi]
- Exploring Persona Sentiment Sensitivity in Personalized Dialogue GenerationYonghyun Jun, Hwanhee Lee. 18384-18402 [doi]
- AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeXiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao 0007, Yubo Ma, Mingzhe Du, Rui Mao 0010, Anh Tuan Luu, William Yang Wang. 18403-18419 [doi]
- LLM-Guided Semantic-Aware Clustering for Topic ModelingJianghan Liu, Ziyu Shang, Wenjun Ke, Peng Wang, Zhizhao Luo, Jiajun Liu, Guozheng Li, Yining Li. 18420-18435 [doi]
- Hierarchical Bracketing Encodings for Dependency Parsing as TaggingAna Ezquerro, David Vilares 0001, Anssi Yli-Jyrä, Carlos Gómez-Rodríguez. 18436-18450 [doi]
- OASIS: Order-Augmented Strategy for Improved Code SearchZuchen Gao, Zizheng Zhan, Xianming Li, Erxin Yu, Haotian Zhang, Chenbin Chenbin, Yuqun Zhang, Jing Li 0049. 18451-18467 [doi]
- Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang 0009, Z. Y. Peng, Zhaoxiang Zhang 0001, Zhicheng Zheng, Wenbo Su, Bo Zheng 0007. 18468-18489 [doi]
- OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human PreferenceXiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosongcao Maosongcao, Jiaqi Wang 0003, Weiyun Wang, XinYu Fang, Wenhai Wang, Guangtao Zhai, Hua Yang 0001, Haodong Duan, Kai Chen 0026. 18490-18515 [doi]
- Tree-KG: An Expandable Knowledge Graph Construction Framework for Knowledge-intensive DomainsSongjie Niu, Kaisen Yang, Rui Zhao, Yichao Liu, Zonglin Li, Hongning Wang, Wenguang Chen. 18516-18529 [doi]
- Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable MetricYuming Yang, Yang Nan, Junjie Ye, Shihan Dou, Xiao Wang, Shuo Li, Huijie Lv, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 18530-18549 [doi]
- Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-ReasoningNan Huo, Jinyang Li 0003, Bowen Qin, Ge Qu, Xiaolong Li, Xiaodong Li 0009, Chenhao Ma 0001, Reynold Cheng. 18550-18574 [doi]
- Minimal Pair-Based Evaluation of Code-SwitchingIgor Sterner, Simone Teufel. 18575-18598 [doi]
- DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and ActionsChuanqi Cheng, Hongda Sun 0001, Bo Du 0001, Shuo Shang, Xinrong Hu, Rui Yan 0001. 18599-18616 [doi]
- LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisQingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng 0004. 18617-18629 [doi]
- Error Comparison Optimization for Large Language Models on Aspect-Based Sentiment AnalysisQianlong Wang, Keyang Ding, Hengxin Gao, Hui Wang, Ruifeng Xu. 18630-18646 [doi]
- The AI Gap: How Socioeconomic Status Affects Language Technology InteractionsElisa Bassignana, Amanda Cercas Curry, Dirk Hovy. 18647-18664 [doi]
- Probing LLMs for Multilingual Discourse Generalization Through a Unified Label SetFlorian Eichin, Yang Janet Liu, Barbara Plank, Michael A. Hedderich. 18665-18684 [doi]
- Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast AsiaSamuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, Vicky Feliren, Bahrul Ilmi Nasution, Manuel Antonio Rufino, Genta Indra Winata, Rian Adam Rajagede, Carlos Rafael Catalan, Mohamed Fazli Mohamed Imam, Priyaranjan Pattnayak, Salsabila Zahirah Pranida, Kevin Pratama, Yeshil Bangera, Adisai Na-Thalang, Patricia Nicole Monderin, Yueqi Song, Christian Simon, Lynnette Hui Xian Ng, Richardy Lobo' Sapan, Taki Hasan Rafi, Bin Wang 0040, Supryadi, Kanyakorn Veerakanjana, Piyalitt Ittichaiwong, Matthew Theodore Roque, Karissa Vincentio, Takdanai Kreangphet, Phakphum Artkaew, Kadek Hendrawan Palgunadi, Yanzhi Yu, Rochana Prih Hastuti, William Nixon, Mithil Bangera, Adrian Xuan Wei Lim, Aye Hninn Khine, Hanif Muhammad Zhafran, Teddy Ferdinan, Audra Aurora Izzani, Ayushman Singh, Evan, Jauza Akbar Krito, Michael Anugraha, Fenal Ashokbhai Ilasariya, Haochen Li, John Amadeo Daniswara, Filbert Aurelian Tjiaranata, Eryawan Presma Yulianrifat, Can Udomcharoenchaikit, Fadil Risdian Ansori, Mahardika Krisna Ihsani, Giang Nguyen, Anab Maulana Barik, Dan John Velasco, Rifo Ahmad Genadi, Saptarshi Saha, Chengwei Wei, Isaiah Edri W. Flores, Kenneth Ko Han Chen, Anjela Gail Santos, Wan Shen Lim, Kaung Si Phyo, Tim Santos, Meisyarah Dwiastuti, Jiayun Luo, Jan Christian Blaise Cruz, Ming Shan Hee, Ikhlasul Akmal Hanif, M. Alif Al Hakim, Muhammad Rizky Sya'ban, Kun Kerdthaisong, Lester James Validad Miranda, Fajri Koto, Tirana Noor Fatyanosa, Alham Fikri Aji, Jostin Jerico Rosal, Jun Kevin, Robert Wijaya, Onno P. Kampman, Ruochen Zhang, Börje F. Karlsson, Peerat Limkonchotiwat. 18685-18717 [doi]
- Soundwave: Less is More for Speech-Text Alignment in LLMsYuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li 0001. 18718-18738 [doi]
- RoToR: Towards More Reliable Responses for Order-Invariant InputsSoyoung Yoon, Dongha Ahn, Youngwon Lee 0003, Minkyu Jung, HyungJoo Jang, Seung-won Hwang. 18739-18760 [doi]
- Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual EvaluationShivalika Singh, Angelika Romanou, Clémentine Fourrier, David Ifeoluwa Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre, Sebastian Ruder, Wei-Yin Ko, Antoine Bosselut, Alice Oh, André F. T. Martins, Leshem Choshen, Daphne Ippolito, Enzo Ferrante, Marzieh Fadaee, Beyza Ermis, Sara Hooker. 18761-18799 [doi]
- Improving Dialogue Discourse Parsing through Discourse-aware Utterance ClarificationYaxin Fan, Peifeng Li, Qiaoming Zhu. 18800-18816 [doi]
- ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMsYan Yang, Yixia Li, Hongru Wang 0003, Xuetao Wei, James Jianqiao Yu, Yun Chen 0007, Guanhua Chen 0001. 18817-18829 [doi]
- Words of Warmth: Trust and Sociability Norms for over 26k English WordsSaif M. Mohammad. 18830-18850 [doi]
- BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language ModelsLindia Tjuatja, Graham Neubig. 18851-18873 [doi]
- HAF-RM: A Hybrid Alignment Framework for Reward Model TrainingShujun Liu, Xiaoyu Shen, Yuhang Lai, Siyuan Wang, Shengbin Yue, Zengfeng Huang, Xuanjing Huang 0001, Zhongyu Wei. 18874-18893 [doi]
- CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion UnderstandingTadesse Destaw Belay, Ahmed Haj Ahmed, Alvin Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, Seid Muhie Yimam. 18894-18909 [doi]
- DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language ModelsRuizhe Chen, Wenhao Chai, Zhifei Yang, Xiaotian Zhang, Ziyang Wang, Tony Q. S. Quek, Joey Tianyi Zhou, Soujanya Poria, Zuozhu Liu. 18910-18925 [doi]
- MemeQA: Holistic Evaluation for Meme UnderstandingKhoi P. N. Nguyen, Terrence Li, Derek Lou Zhou, Gabriel Xiong, Pranav Balu, Nandhan Alahari, Alan Huang, Tanush Chauhan, Harshavardhan Bala, Emre Guzelordu, Affan Kashfi, Aaron Xu, Suyesh Shrestha, Megan Kim Vu, Jerry Yining Wang, Vincent Ng 0001. 18926-18946 [doi]
- LoGU: Long-form Generation with Uncertainty ExpressionsRuihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Sen Yang, Nigel Collier, Dong Yu 0001, Deqing Yang. 18947-18968 [doi]
- KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented GenerationJinyuan Fang, Zaiqiao Meng, Craig Macdonald. 18969-18985 [doi]
- Enhancing Lexicon-Based Text Embeddings with Large Language ModelsYibin Lei, Tao Shen 0001, Yu Cao 0014, Andrew Yates. 18986-19001 [doi]
- CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text GenerationT. Y. S. S. Santosh, Youssef Tarek Elkhayat, Oana Ichim, Pranav Shetty, Dongsheng Wang 0005, Zhiqiang Ma, Armineh Nourbakhsh, Xiaomo Liu. 19002-19018 [doi]
- Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive SummarizationItai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty. 19019-19035 [doi]
- CC-Tuning: A Cross-Lingual Connection Mechanism for Improving Joint Multilingual Supervised Fine-TuningYangfan Ye, Xiaocheng Feng, Zekun Yuan, Xiachong Feng, Libo Qin 0001, Lei Huang 0021, Weitao Ma, Yichong Huang, Zhirui Zhang, Yunfei Lu, Xiaohui Yan, Duyu Tang, Dandan Tu, Bing Qin 0001. 19036-19051 [doi]
- SConU: Selective Conformal Uncertainty in Large Language ModelsZhiyuan Wang, Qingni Wang, Yue Zhang, Tianlong Chen 0001, Xiaofeng Zhu 0001, Xiaoshuang Shi, Kaidi Xu. 19052-19075 [doi]
- MegaPairs: Massive Data Synthesis for Universal Multimodal RetrievalJunjie Zhou 0001, Yongping Xiong, Zheng Liu 0011, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao 0015, Chen Jason Zhang, Defu Lian. 19076-19095 [doi]
- When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTsXinyue Shen 0001, Yun Shen, Michael Backes 0001, Yang Zhang 0016. 19096-19111 [doi]
- UniCodec: Unified Audio Codec with Single Domain-Adaptive CodebookYidi Jiang, Qian Chen 0003, Shengpeng Ji, Yu Xi, Wen Wang 0001, Chong Zhang 0003, Xianghu Yue, Shiliang Zhang, Haizhou Li 0001. 19112-19124 [doi]
- KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language ModelsFnu Mohbat, Mohammed J. Zaki. 19125-19141 [doi]
- Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual ProgressAyomide Odumakinde, Daniel D'Souza, Pat Verga, Beyza Ermis, Sara Hooker. 19142-19164 [doi]
- Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language ModelsYuheng Lu, Bingshuo Qian, Caixia Yuan, Huixing Jiang, Xiaojie Wang. 19165-19181 [doi]
- Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language ModelsYancheng He, Shilong Li, Jiaheng Liu, Yingshui Tan, Weixun Wang, Hui Huang 0021, Xingyuan Bu, Hangyu Guo, Chengwei Hu, Boren Zheng, Zhuoran Lin, Dekai Sun, Zhicheng Zheng, Wenbo Su, Bo Zheng 0007. 19182-19208 [doi]
- PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness RatingsJunseo Kim, Jongwook Han, Dongmin Choi, Jongwook Yoon, Eun-Ju Lee, Yohan Jo. 19209-19237 [doi]
- Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information RetrievalZheng Liu 0011, Ze Liu, Zhengyang Liang, Junjie Zhou 0001, Shitao Xiao, Chao Gao, Chen Jason Zhang, Defu Lian. 19238-19261 [doi]
- Tunable LLM-based Proactive Recommendation AgentMingze Wang, Chongming Gao, Wenjie Wang 0007, Yangyang Li, Fuli Feng. 19262-19276 [doi]
- AgentRM: Enhancing Agent Generalization with Reward ModelingYu Xia, Jingru Fan, Weize Chen, Siyu Yan, Xin Cong, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Maosong Sun 0001. 19277-19290 [doi]
- From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time AlignmentBin Xie, Bingbing Xu 0001, Yige Yuan, Shengmao Zhu, Huawei Shen. 19291-19307 [doi]
- Segment-Based Attention Masking for GPTsShahar Katz, Liran Ringel, Yaniv Romano, Lior Wolf. 19308-19322 [doi]
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space CapacityYuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev 0001. 19323-19339 [doi]
- Bi-Tuning with Collaborative Information for Controllable LLM-based Sequential RecommendationXinyu Zhang, Linmei Hu, Luhao Zhang, Wentao Cheng, Yashen Wang, Ge Shi 0002, Chong Feng 0001, Liqiang Nie. 19340-19351 [doi]
- A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks AlignmentJean-Philippe Corbeil, Amin Dada, Jean-Michel Attendu, Asma Ben Abacha, Alessandro Sordoni, Lucas Caccia, François Beaulieu, Thomas Lin, Jens Kleesiek, Paul Vozila. 19352-19374 [doi]
- DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-ExpertsYuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu 0008, Zheng Lin 0001, Weiping Wang 0005. 19375-19394 [doi]
- DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionYi Zhao, Zuchao Li, Hai Zhao 0001, Baoyuan Qi, Liu Guoming. 19395-19407 [doi]
- Computation Mechanism Behind LLM Position GeneralizationChi Han, Heng Ji 0001. 19408-19424 [doi]
- IPO: Your Language Model is Secretly a Preference ClassifierShivank Garg, Ayush Singh, Shweta Singh, Paras Chopra. 19425-19441 [doi]
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-upJiahao Yuan, Dehui Du, Hao Zhang, Zixiang Di, Usman Naseem. 19442-19459 [doi]
- Déjà Vu? Decoding Repeated Reading from Eye MovementsYoav Meiri, Omer Shubi, Cfir Avraham Hadar, Ariel Kreisberg Nitzav, Yevgeni Berzak. 19460-19482 [doi]
- LLMs can be easily Confused by Instructional DistractionsYerin Hwang, Yongil Kim, Jahyun Koo 0004, Taegwan Kang, Hyunkyung Bae, Kyomin Jung. 19483-19496 [doi]
- PlanGenLLMs: A Modern Survey of LLM Planning CapabilitiesHui Wei, Zihao Zhang, Shenghua He, Tian Xia, Shijia Pan, Fei Liu. 19497-19521 [doi]
- IAM: Efficient Inference through Attention Mapping between Different-scale LLMsYi Zhao, Zuchao Li, Hai Zhao 0001. 19522-19533 [doi]
- nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent WorkflowGeliang Ouyang, Jingyao Chen, Zhihe Nie, Yi Gui, Yao Wan 0001, Hongyu Zhang 0002, Dongping Chen. 19534-19567 [doi]
- ZIPA: A family of efficient models for multilingual phone recognitionJian Zhu, Farhan Samir, Eleanor Chodroff, David R. Mortensen. 19568-19585 [doi]
- GRACE: A Granular Benchmark for Evaluating Model Calibration against Human CalibrationYoo yeon Sung, Eve Fleisig, Yu Hou, Ishan Upadhyay, Jordan Lee Boyd-Graber. 19586-19587 [doi]
- Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language ModelsLanxue Zhang, Yanan Cao 0001, Yuqiang Xie, Fang Fang 0009, Yangxi Li. 19588-19608 [doi]
- From Tools to Teammates: Evaluating LLMs in Multi-Session Coding InteractionsNathanaël Carraz Rakotonirina, Mohammed Hamdy, Jon Ander Campos, Lucas Weber, Alberto Testoni, Marzieh Fadaee, Sandro Pezzelle, Marco Del Tredici. 19609-19642 [doi]
- Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous ConstraintsJunxiao Yang, Zhexin Zhang, Shiyao Cui, Hongning Wang, Minlie Huang. 19643-19655 [doi]
- Multilingual Text-to-Image Generation Magnifies Gender StereotypesFelix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindrich Libovický, Alexander Fraser 0001, Kristian Kersting. 19656-19679 [doi]
- Adversarial Alignment with Anchor Dragging Drift (A³D²): Multimodal Domain Adaptation with Partially Shifted ModalitiesJun Sun, Xinxin Zhang, Simin Hong, Jian Zhu, Lingfang Zeng. 19680-19690 [doi]
- A Reality Check on Context Utilisation for Retrieval-Augmented GenerationLovisa Hagström, Sara Vera Marjanovic, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein. 19691-19730 [doi]
- CU-MAM: Coherence-Driven Unified Macro-Structures for Argument MiningDebela Gemechu, Chris Reed 0001. 19731-19749 [doi]
- Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to ArtifactsHongyu Chen 0008, Seraphina Goldfarb-Tarrant. 19750-19766 [doi]
- Text-to-ES Bench: A Comprehensive Benchmark for Converting Natural Language to Elasticsearch QueryDongge Xue, Zhili Pu, Zhentao Xia, Hongli Sun, Ruihui Hou, Guangya Yu, Yupian Lin, Yongqi Fan, JingPing Liu, Tong Ruan. 19767-19790 [doi]
- AlignDistil: Token-Level Language Model Alignment as Adaptive Policy DistillationSongming Zhang 0001, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen 0005, Jinan Xu. 19791-19807 [doi]
- DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree TraversalVaibhav Aggarwal, Ojasv Kamal, Abhinav Japesh, Zhijing Jin 0001, Bernhard Schölkopf. 19808-19855 [doi]
- Steering off Course: Reliability Challenges in Steering Language ModelsPatrick Queiroz Da Silva, Hari Sethuraman, Dheeraj Rajagopal, Hannaneh Hajishirzi, Sachin Kumar 0009. 19856-19882 [doi]
- Impartial Multi-task Representation Learning via Variance-invariant Probabilistic DecodingDou Hu 0001, Lingwei Wei, Wei Zhou 0019, Songlin Hu 0001. 19883-19897 [doi]
- If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM WorldAdrian de Wynter. 19898-19913 [doi]
- Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party ConversationLuyao Cheng, Hui Wang, Chong Deng, Siqi Zheng, Yafeng Chen, Rongjie Huang, Qinglin Zhang, Qian Chen, Xihao Li, Wen Wang. 19914-19928 [doi]
- Vulnerability of LLMs to Vertically Aligned Text ManipulationsZhecheng Li, Yiwei Wang 0001, Bryan Hooi, Yujun Cai, Zhen Xiong, Nanyun Peng 0001, Kai-Wei Chang. 19929-19941 [doi]
- AutoMixer: Checkpoint Artifacts as Automatic Data MixersErnie Chang, Yang Li 0183, Patrick Huber, Vish Vogeti, David Kant, Yangyang Shi, Vikas Chandra. 19942-19953 [doi]
- Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum FlowBehrooz Azarkhalili, Maxwell W. Libbrecht. 19954-19974 [doi]
- Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question AnsweringZhanghao Hu, Hanqi Yan, Qinglin Zhu, Zhenyi Shen, Yulan He 0001, Lin Gui 0003. 19975-19990 [doi]
- AIR-Bench: Automated Heterogeneous Information Retrieval BenchmarkJianlyu Chen, Nan Wang, Chaofan Li, Bo Wang, Shitao Xiao, Han Xiao, Hao Liao, Defu Lian, Zheng Liu 0011. 19991-20022 [doi]
- We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Jiapeng Wang, Zhuoma Gongque, Shanglin Lei, Yifan Zhang, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Xiao Zong, Yida Xu, Peiqing Yang 0003, Zhimin Bao, Muxi Diao, Chen Li, Honggang Zhang 0002. 20023-20070 [doi]
- Modeling the Evolution of English Noun Compounds with Feature-Rich Diachronic Compositionality PredictionFilip Miletic 0002, Sabine Schulte im Walde. 20071-20092 [doi]
- What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token PatternsMichael A. Hedderich, Anyi Wang, Raoyuan Zhao, Florian Eichin, Jonas Fischer, Barbara Plank. 20093-20123 [doi]
- V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and MeRunqi Qiao, Qiuna Tan, Guanting Dong, MinhuiWu MinhuiWu, Jiapeng Wang, Yifan Zhang, Zhuoma Gongque, Chong Sun, Yida Xu, Yadong Xue, Ye Tian, Zhimin Bao, Lan Yang 0014, Chen Li, Honggang Zhang 0002. 20124-20150 [doi]
- Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text ComprehensionAmir Hossein Yari, Fajri Koto. 20151-20170 [doi]
- Improving Language and Modality Transfer in Translation by Character-level ModelingIoannis Tsiamas, David Dale, Marta R. Costa-Jussà. 20171-20187 [doi]
- DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to ModelsNiyati Bafna, Emily Chang, Nathaniel Romney Robinson, David R. Mortensen, Kenton Murray, David Yarowsky, Hale Sirin. 20188-20233 [doi]
- AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMsNicholas E. Corrado, Julian Katz-Samuels, Adithya M. Devraj, Hyokun Yun, Chao Zhang, Yi Xu 0011, Yi Pan, Bing Yin, Trishul Chilimbi. 20234-20258 [doi]
- Modeling Complex Semantics Relation with Contrastively Fine-Tuned Relational EncodersNaïm Es-Sebbani, Esteban Marquer, Zied Bouraoui. 20259-20288 [doi]
- Error-driven Data-efficient Large Multimodal Model TuningBarry Menglong Yao, Qifan Wang, Lifu Huang. 20289-20306 [doi]
- Planning with Diffusion Models for Target-Oriented Dialogue SystemsHanwen Du, Bo Peng, Xia Ning. 20307-20329 [doi]
- Interactive and Expressive Code-Augmented Planning with Large Language ModelsAnthony Zhe Liu, Xinhe Wang 0001, Jacob Sansom, Yao Fu, Jongwook Choi, Sungryull Sohn, Jaekyeom Kim, Honglak Lee. 20330-20354 [doi]
- Synergistic Weak-Strong Collaboration by Aligning PreferencesYizhu Jiao, Xuchao Zhang, Zhaoyang Wang, Yubo Ma, Zhun Deng, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Jiawei Han 0001, Huaxiu Yao. 20355-20371 [doi]
- Understanding Silent Data Corruption in LLM TrainingJeffrey Jian Ma, Hengzhi Pei, Leonard Lausen, George Karypis. 20372-20394 [doi]
- Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackGuan-Ting Lin, Prashanth Gurunath Shivakumar, Aditya Gourav, Yile Gu, Ankur Gandhe, Hung-yi Lee, Ivan Bulyko. 20395-20411 [doi]
- Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMsJungsoo Park, Junmo Kang, Gabriel Stanovsky, Alan Ritter. 20412-20433 [doi]
- BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded DataWenkai Li, Jiarui Liu 0004, Andy Liu, Xuhui Zhou, Mona T. Diab, Maarten Sap. 20434-20471 [doi]
- Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect TimesOlga Loginova, Sofía Ortega Loguinova. 20472-20502 [doi]
- Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy for LLMsEddie L. Ungless, Sunipa Dev, Cynthia L. Bennett, Rebecca Gulotta, Jasmijn Bastings, Remi Denton. 20503-20535 [doi]
- Enhancing Human Evaluation in Machine Translation with Comparative JudgementYixiao Song, Parker Riley, Daniel Deutsch, Markus Freitag. 20536-20551 [doi]
- Infogen: Generating Complex Statistical Infographics from DocumentsAkash Ghosh, Aparna Garimella, Pritika Ramu, Sambaran Bandyopadhyay, Sriparna Saha 0001. 20552-20570 [doi]
- Partial Colexifications Improve Concept EmbeddingsArne Rubehn, Johann-Mattis List. 20571-20586 [doi]
- Improved Unbiased Watermark for Large Language ModelsRuibo Chen, Yihan Wu, Junfeng Guo, Heng Huang. 20587-20601 [doi]
- MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine ProjectionYixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania. 20602-20618 [doi]
- Multi-Attribute Steering of Language Models via Targeted InterventionDuy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal. 20619-20634 [doi]
- AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human DemonstrationsGaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso. 20635-20651 [doi]
- Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research PapersZhijian Xu, Yilun Zhao 0001, Manasi Patwardhan 0001, Lovekesh Vig, Arman Cohan. 20652-20706 [doi]
- On the Acquisition of Shared Grammatical Representations in Bilingual Language ModelsCatherine Arnett, Tyler A. Chang, James A. Michaelov, Ben Bergen 0001. 20707-20726 [doi]
- Using Shapley interactions to understand how models use structureDivyansh Singhvi, Diganta Misra, Andrej Erkelens, Raghav Jain, Isabel Papadimitriou, Naomi Saphra. 20727-20737 [doi]
- Adversarial TokenizationRenato Lui Geh, Zilei Shao, Guy Van den Broeck. 20738-20765 [doi]
- Classifying Unreliable Narrators with Large Language ModelsAnneliese Brei, Katharine Henry, Abhisheik Sharma, Shashank Srivastava, Snigdha Chaturvedi. 20766-20791 [doi]
- ConceptCarve: Dynamic Realization of EvidenceEylon Caplan, Dan Goldwasser. 20792-20809 [doi]
- QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question AnsweringAn Quang Tang, Xiuzhen Zhang 0001, Minh Ngoc Dinh, Zhuang Li. 20810-20831 [doi]
- Navigating Rifts in Human-LLM Grounding: Study and BenchmarkOmar Shaikh, Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz. 20832-20847 [doi]
- Substance over Style: Evaluating Proactive Conversational Coaching AgentsVidya Srinivas, Xuhai Xu, Xin Liu 0034, Kumar Ayush, Isaac R. Galatzer-Levy, Shwetak N. Patel, Daniel McDuff, Tim Althoff. 20848-20880 [doi]
- Open-World Planning via Lifted Regression with LLM-Inferred Affordances for Embodied AgentsXiaotian Liu, Ali Pesaranghader, Hanze Li, Punyaphat Sukcharoenchaikul, Jaehong Kim, Tanmana Sadhu, Hyejeong Jeon, Scott Sanner. 20881-20897 [doi]
- (RSA)²: A Rhetorical-Strategy-Aware Rational Speech Act Framework for Figurative Language UnderstandingCesare Spinoso Di Piano, David Eric Austin, Pablo Piantanida, Jackie CK Cheung. 20898-20938 [doi]
- SYNTHIA: Novel Concept Design with Affordance CompositionHyeonjeong Ha, Xiaomeng Jin, Jeonghwan Kim, Jiateng Liu, Zhenhailong Wang, Khanh-Duy Nguyen, Ansel Blume, Nanyun Peng 0001, Kai-Wei Chang, Heng Ji 0001. 20939-20958 [doi]
- Consistent Client Simulation for Motivational Interviewing-based CounselingYizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang 0001, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, Ee-Peng Lim. 20959-20998 [doi]
- AUTALIC: A Dataset for Anti-AUTistic Ableist Language In ContextNaba Rizvi, Harper Strickland, Daniel Gitelman, Alexis Morales Flores, Tristan Cooper, Aekta Kallepalli, Akshat Alurkar, Haaset Owens, Saleha Ahmedi, Isha Khirwadkar, Imani N. S. Munyaka, Nedjma Ousidhoum. 20999-21015 [doi]
- Structural Reasoning Improves Molecular Understanding of LLMYunhui Jang, Jaehyung Kim, Sungsoo Ahn. 21016-21036 [doi]
- CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic ExplorationYizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang 0001, Phey Ling Kit, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Ee-Peng Lim. 21037-21081 [doi]
- Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit ProfilesKuang Wang, Xianfei Li, Shenghao Yang 0001, Li Zhou 0010, Feng Jiang 0007, Haizhou Li 0001. 21082-21107 [doi]
- Targeted Syntactic Evaluation for Grammatical Error CorrectionAomi Koyama, Masato Mita, Su-Youn Yoon, Yasufumi Takama, Mamoru Komachi. 21108-21125 [doi]
- VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC VideosTingyu Song, Tongyan Hu, Guo Gan, Yilun Zhao 0001. 21126-21146 [doi]
- Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public OpinionsJoseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang. 21147-21170 [doi]
- TESS 2: A Large-Scale Generalist Diffusion Language ModelJaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan. 21171-21188 [doi]
- KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature AnalysisShinwoo Park, Shubin Kim, Do-Kyung Kim, Yo-Sub Han. 21189-21222 [doi]
- Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQLHanbing Liu, Haoyang Li, Xiaokang Zhang, Ruotong Chen, Haiyong Xu, Tian Tian, Qi Qi, Jing Zhang. 21223-21261 [doi]
- On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented CulturesMinh Duc Bui, Kyung-Eun Park, Goran Glavas, Fabian David Schmidt, Katharina von der Wense. 21262-21276 [doi]
- CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee 0001. 21277-21297 [doi]
- Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving ReasoningYue Zhou, Barbara Di Eugenio. 21298-21310 [doi]
- Optimal Transport-Based Token Weighting scheme for Enhanced Preference OptimizationMeng Li, Guangda Huzhang, Haibo Zhang, Xiting Wang, Anxiang Zeng. 21311-21334 [doi]
- LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical StudyDongil Yang, Minjin Kim, Sunghwan Kim, Beong-woo Kwak, MinJun Park, Jinseok Hong, Woontack Woo, Jinyoung Yeo. 21335-21360 [doi]
- Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent SystemsHaochun Wang, Sendong Zhao, Jingbo Wang, Zewen Qiang, Bing Qin 0001, Ting Liu 0001. 21361-21375 [doi]
- The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code GenerationXiaoyu Zhang 0013, Juan Zhai, ShiQing Ma, Qingshuang Bao, Weipeng Jiang, Qian Wang 0002, Chao Shen 0001, Yang Liu 0003. 21376-21403 [doi]
- K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in KoreanMinkyeong Jeon, Hyemin Jeong, Yerang Kim, Jiyoung Kim, Jae Hyeon Cho, Byung Jun Lee. 21404-21432 [doi]
- THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine TranslationYunlong Liang, Fandong Meng, Jie Zhou. 21433-21445 [doi]
- Neuron Empirical Gradient: Discovering and Quantifying Neurons' Global Linear ControllabilityXin Zhao, Zehui Jiang, Naoki Yoshinaga 0001. 21446-21477 [doi]
- Can Third Parties Read Our Emotions?Jiayi Li, Yingfan Zhou, Pranav Narayanan Venkit, Halima Binte Islam, Sneha Arya, Shomir Wilson, Sarah Rajtmajer. 21478-21499 [doi]
- OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingNghia-Huynh Nguyen-Hieu, Ngoc Son Nguyen, Huynh Nguyen Dang, Thieu Vo, Truong-Son Hy, Van Nguyen. 21500-21517 [doi]
- World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task PlanningSiyin Wang, Zhaoye Fei, Qinyuan Cheng, Shiduo Zhang, Panpan Cai, JinLan Fu, Xipeng Qiu. 21518-21537 [doi]
- JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMsJunjie Chu 0002, Yugeng Liu, Ziqing Yang 0002, Xinyue Shen 0001, Michael Backes 0001, Yang Zhang 0016. 21538-21566 [doi]
- CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language ModelsXiaqiang Tang, Jian Li, Keyu Hu, Nan Du, Xiaolong Li, Xi Zhang, Weigao Sun, Sihong Xie. 21567-21585 [doi]
- Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language ModelsYuqiao Tan, Shizhu He, Kang Liu, Jun Zhao. 21586-21601 [doi]
- Enhancing Mathematical Reasoning in LLMs by Stepwise CorrectionZhenyu Wu 0004, Qingkai Zeng 0001, Zhihan Zhang 0001, Zhaoxuan Tan, Chao Shen, Meng Jiang 0001. 21602-21623 [doi]
- PsyDial: A Large-scale Long-term Conversational Dataset for Mental Health SupportHuachuan Qiu, Zhenzhong Lan. 21624-21655 [doi]
- Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and CorrectionDidi Zhang, Yaxin Fan, Peifeng Li, Qiaoming Zhu. 21656-21672 [doi]
- Exclusion of Thought: Mitigating Cognitive Load in Large Language Models for Enhanced Reasoning in Multiple-Choice TasksQihang Fu, Yongbin Qin, Ruizhang Huang, Yanping Chen 0010, Yulin Zhou, Lintao Long. 21673-21686 [doi]
- Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine TranslationZhi Qu 0001, Yiran Wang 0006, Jiannan Mao, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Taro Watanabe. 21687-21706 [doi]
- VisuoThink: Empowering LVLM Reasoning with Multimodal Tree SearchYikun Wang, Siyin Wang, Qinyuan Cheng, Zhaoye Fei, Liang Ding 0006, Qipeng Guo, Dacheng Tao, Xipeng Qiu. 21707-21719 [doi]
- Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language ModelsJianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Xiaohong Guan. 21720-21748 [doi]
- LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-DisjointQianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao. 21749-21767 [doi]
- Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and FeedbackJiakang Yuan, Xiangchao Yan, Bo Zhang 0069, Tao Chen 0003, Botian Shi, Wanli Ouyang, Yu Qiao 0001, Lei Bai 0001, Bowen Zhou 0002. 21768-21789 [doi]
- PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and SummarizationYun Luo, Yingjie Li 0008, Xiangkun Hu, Qinglin Qi, Fang Guo, Qipeng Guo, Zheng Zhang 0001, Yue Zhang 0004. 21790-21805 [doi]
- Prompt-Guided Internal States for Hallucination Detection of Large Language ModelsFujie Zhang, Peiqi Yu, Biao Yi, Baolei Zhang, Tong Li 0011, Zheli Liu. 21806-21818 [doi]
- Typology-Guided Adaptation in Multilingual ModelsNdapa Nakashole. 21819-21835 [doi]
- Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage CollectionsOrfeas Menis-Mastromichalakis, Jason Liartis, Kristina Rose, Antoine Isaac, Giorgos Stamou. 21836-21850 [doi]
- ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of IntentShangjian Yin, Peijie Huang, Jiatian Chen, Haojing Huang, Yuhong Xu. 21851-21862 [doi]
- FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented GenerationQinggang Zhang, Zhishang Xiang, Yilin Xiao, Le Wang, Junhui Li, Xinrun Wang, Jinsong Su. 21863-21882 [doi]
- Knowledge Image Matters: Improving Knowledge-Based Visual Reasoning with Multi-Image Large Language ModelsGuanghui Ye, Huan Zhao 0003, Zhixue Zhao, Xupeng Zha, Yang Liu, Zhihua Jiang. 21883-21896 [doi]
- Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and ProactivityYupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen 0001, Kang Liu 0001, Jun Zhao 0001. 21897-21935 [doi]
- GUICourse: From General Vision Language Model to Versatile GUI AgentWentong Chen, Junbo Cui, Jinyi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun 0001. 21936-21959 [doi]
- Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM CollaborationChaeHun Park, Yujin Baek, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo. 21960-21974 [doi]
- Maximizing the Effectiveness of Larger BERT Models for CompressionWen-Shu Fan, Su Lu, Shangyu Xing, Xin-Chun Li, De-Chuan Zhan. 21975-21990 [doi]
- Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification InferenceThanh Le-Cong, Bach Le 0001, Toby Murray. 21991-22014 [doi]
- HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI CoauthoringZhixiong Su, Yichen Wang, Herun Wan, Zhaohan Zhang, Minnan Luo. 22015-22036 [doi]
- IndicSynth: A Large-Scale Multilingual Synthetic Speech Dataset for Low-Resource Indian LanguagesDivya V. Sharma, Vijval Ekbote, Anubha Gupta. 22037-22060 [doi]
- Reinforced IR: A Self-Boosting Framework For Domain-Adapted Information RetrievalChaofan Li, Jianlyu Chen, Yingxia Shao, Chaozhuo Li 0001, Quanqing Xu, Defu Lian, Zheng Liu 0011. 22061-22073 [doi]
- CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsXiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia 0001, Hao Zhang 0048, Xinyi Dai, Yasheng Wang, Ruiming Tang. 22074-22091 [doi]
- Enhancing Multimodal Retrieval via Complementary Information Extraction and AlignmentDelong Zeng, Yuexiang Xie, Yaliang Li, Ying Shen 0001. 22092-22105 [doi]
- JoPA: Explaining Large Language Model's Generation via Joint Prompt AttributionYurui Chang, Bochuan Cao, Yujia Wang, Jinghui Chen, Lu Lin 0001. 22106-22122 [doi]
- Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete DataAoqiang Zhu, Min Hu, Xiaohua Wang 0002, Jiaoyun Yang, Yiming Tang 0001, Ning An 0001. 22123-22138 [doi]
- Not All Terms Matter: Recall-Oriented Adaptive Learning for PLM-aided Query Expansion in Open-Domain Question AnsweringXinran Chen, Ben He, Xuanang Chen, Le Sun 0001. 22139-22151 [doi]
- A Mutual Information Perspective on Knowledge Graph EmbeddingJiang Li, Xiangdong Su, Zehua Duo, Tian Lan, Xiaotao Guo, Guanglai Gao. 22152-22166 [doi]
- Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of RaceLihao Sun, Chengzhi Mao, Valentin Hofmann, Xuechunzi Bai. 22167-22184 [doi]
- IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference OptimizationXinghua Zhang 0001, Haiyang Yu 0003, Cheng Fu 0003, Fei Huang 0002, Yongbin Li. 22185-22200 [doi]
- ProMALex: Progressive Modular Adapters for Multi-Jurisdictional Legal Language ModelingT. Y. S. S. Santosh, Mohamed Hesham Elganayni. 22201-22217 [doi]
- Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text MatchingMingzhe Li 0001, Jing Xiang, Qishen Zhang, Kaiyang Wan, Xiuying Chen. 22218-22229 [doi]
- Disentangling Language and Culture for Evaluating Multilingual Large Language ModelsJiahao Ying, Wei Tang 0015, Yiran Zhao 0006, Yixin Cao 0006, Yu Rong, Wenxuan Zhang 0001. 22230-22251 [doi]
- Detecting Sockpuppetry on Wikipedia Using Meta-LearningLuc Raszewski, Christine de Kock. 22252-22264 [doi]
- Diversity-oriented Data Augmentation with Large Language ModelsZaitian Wang, Jinghan Zhang 0002, Xinhao Zhang 0001, Kunpeng Liu 0001, Pengfei Wang 0008, Yuanchun Zhou. 22265-22283 [doi]
- CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM EvaluationJingqian Zhao, Bingbing Wang, Geng Tu, Yice Zhang, Qianlong Wang, Bin Liang, Jing Li, Ruifeng Xu. 22284-22306 [doi]
- RiOT: Efficient Prompt Refinement with Residual Optimization TreeChenyi Zhou, Zhengyan Shi, Yuan Yao, Lei Liang, Huajun Chen, Qiang Zhang. 22307-22323 [doi]
- Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental DistractionsXinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang 0001, Hai Zhao 0001. 22324-22339 [doi]
- Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation BenchmarkRong-Cheng Tu, Zi-Ao Ma, Tian Lan 0003, Yuehao Zhao, Heyan Huang, Xian-Ling Mao. 22340-22361 [doi]
- Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question AnsweringRongzhi Zhu, Xiangyu Liu, Zequn Sun, Yiwei Wang, Wei Hu. 22362-22375 [doi]
- TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language ModelsXinyi He, Yihao Liu, Mengyu Zhou, Yeye He, Haoyu Dong 0001, Shi Han, Zejian Yuan, Dongmei Zhang 0001. 22376-22391 [doi]
- Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and RefinementMaosongcao Maosongcao, Taolin Zhang 0003, Mo Li, Chuyu Zhang, Yunxin Liu, Conghui He, Haodong Duan, Songyang Zhang, Kai Chen 0026. 22392-22412 [doi]
- CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data SynthesisRuixiang Feng, Shen Gao, Xiuying Chen, Lisi Chen 0001, Shuo Shang. 22413-22430 [doi]
- Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency AnalysisJunzhuo Li, Bo Wang, Xiuze Zhou, Peijie Jiang, Jia Liu, Xuming Hu. 22431-22446 [doi]
- ChartLens: Fine-grained Visual Attribution in ChartsManan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha. 22447-22462 [doi]
- LESA: Learnable LLM Layer Scaling-UpYifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Zhi Chen 0006, Libo Qin 0001, Hai Zhao 0001. 22463-22476 [doi]
- MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World ConversationHaochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang 0006, Chun-Mei Feng, Yutong Xie 0001, Imran Razzak, ZongYuan Ge, Jionglong Su, Junjun He, Yu Qiao 0001. 22477-22503 [doi]
- Towards the Law of Capacity Gap in Distilling Language ModelsChen Zhang 0020, Qiuchi Li, Dawei Song 0001, Zheyu Ye, Yan Gao 0017, Yao Hu 0002. 22504-22528 [doi]
- WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningRajath Rao, Adithya V. Ganesan, Oscar N. E. Kjell, Jonah Luby, Akshay Raghavan, Scott M. Feltman, Whitney Ringwald, Ryan L. Boyd, Benjamin J. Luft, Camilo J. Ruggero, Neville Ryant, Roman Kotov, H. Andrew Schwartz. 22529-22544 [doi]
- Keys to Robust Edits: From Theoretical Insights to Practical AdvancesJianhao Yan, Futing Wang, Yun Luo, Yafu Li, Yue Zhang. 22545-22560 [doi]
- Boosting LLM's Molecular Structure Elucidation with Knowledge Enhanced Tree Search ReasoningXiang Zhuang, Bin Wu 0025, Jiyu Cui, Kehua Feng, Xiaotong Li, Huabin Xing, Keyan Ding, Qiang Zhang 0026, Huajun Chen. 22561-22576 [doi]
- MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented GenerationMaría Andrea Cruz Blandón, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, Marcello Federico. 22577-22595 [doi]
- The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and InsightsYufang Liu, Yao Du, Tao Ji, Jianing Wang, Yang Liu, Yuanbin Wu, Aimin Zhou, Mengdi Zhang, Xunliang Cai. 22596-22611 [doi]
- The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story CharactersChulun Zhou, Qiujing Wang, Mo Yu, Xiaoqian Yue, Rui Lu, Jiangnan Li, Yifan Zhou, Shunchi Zhang, Jie Zhou 0016, Wai Lam. 22612-22631 [doi]
- S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement LearningRuotian Ma, Peisong Wang, Cheng Liu, Xingyan Liu, Jiaqi Chen, Bang Zhang, Xin Zhou, Nan Du, Jia Li. 22632-22654 [doi]
- Advancing Collaborative Debates with Role Differentiation through Multi-Agent Reinforcement LearningHaoran Li, Ziyi Su, Yun Xue, Zhiliang Tian, Yiping Song, Minlie Huang. 22655-22666 [doi]
- Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program GenerationDeokhyung Kang, Jeonghun Cho 0002, Yejin Jeon, Sunbin Jang, Minsub Lee, Jawoon Cho, Gary Geunbae Lee. 22667-22686 [doi]
- STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and BeyondNils Dycke, Matej Zecevic, Ilia Kuznetsov, Beatrix Suess, Kristian Kersting, Iryna Gurevych. 22687-22727 [doi]
- XDAC: XAI-Driven Detection and Attribution of LLM-Generated News Comments in KoreanWooyoung Go, Hyoungshick Kim, Alice Oh, Yongdae Kim. 22728-22750 [doi]
- CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer InferenceJinglong Luo, Guanzhong Chen, Yehong Zhang, Shiyu Liu, Hui Wang 0013, Yue Yu 0001, Xun Zhou 0001, Yuan Qi 0001, Zenglin Xu. 22751-22770 [doi]
- Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on TwitchPrarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Nitin Bhagoji. 22771-22797 [doi]
- EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language ModelsChe Hyun Lee, Heeseung Kim, Jiheum Yeom, Sungroh Yoon. 22798-22815 [doi]
- TUMLU: A Unified and Native Language Understanding Benchmark for Turkic LanguagesJafar Isbarov, Arofat Akhundjanova, Mammad Hajili, Kavsar Huseynova, Dmitry Gaynullin, Anar Rzayev, Osman Tursun, Aizirek Turdubaeva, Ilshat Saetov, Rinat Kharisov, Saule Belginova, Ariana Kenbayeva, Amina Alisheva, Abdullatif Köksal, Samir Rustamov, Duygu Ataman. 22816-22838 [doi]
- Look Both Ways and No Sink: Converting LLMs into Text Encoders without TrainingZiyong Lin, Haoyi Wu, Shu Wang, Kewei Tu, Zilong Zheng, Zixia Jia. 22839-22853 [doi]
- A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language ModelsBowen Chen, Namgi Han, Yusuke Miyao. 22854-22874 [doi]
- Around the World in 24 Hours: Probing LLM Knowledge of Time and PlaceCarolin Holtermann, Paul Röttger, Anne Lauscher. 22875-22897 [doi]
- Mining the uncertainty patterns of humans and models in the annotation of moral foundations and human valuesNeele Falk, Gabriella Lapesa. 22898-22921 [doi]
- "What do you call a dog that is incontrovertibly true? Dogma": Testing LLM Generalization through HumorAlessio Cocchieri, Luca Ragazzi, Paolo Italiani, Giuseppe Tagliavini, Gianluca Moro. 22922-22937 [doi]
- Towards Harmonized Uncertainty Estimation for Large Language ModelsRui Li, Jing Long, Muge Qi, Heming Xia, Lei Sha, Peiyi Wang, Zhifang Sui. 22938-22953 [doi]
- VITAL: A New Dataset for Benchmarking Pluralistic Alignment in HealthcareAnudeex Shetty, Amin Beheshti, Mark Dras, Usman Naseem. 22954-22974 [doi]
- Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social MediaZhen Sun 0001, Zongmin Zhang, Xinyue Shen 0001, Ziyi Zhang, Yule Liu, Michael Backes 0001, Yang Zhang 0016, Xinlei He 0001. 22975-23005 [doi]
- From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction TuningLinjuan Wu, Haoran Wei, Baosong Yang, Weiming Lu 0001. 23006-23023 [doi]
- WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation WatermarksAnudeex Shetty, Qiongkai Xu, Jey Han Lau. 23024-23043 [doi]
- HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and ExtrapolationYuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu. 23044-23056 [doi]
- One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient DeploymentsKe Yi 0003, Yuhui Xu, Heng Chang, Yuan Meng, Tong Zhang 0015, Jia Li 0009. 23057-23066 [doi]
- Beyond Logits: Aligning Feature Dynamics for Effective Knowledge DistillationGuoqiang Gong, Jiaxing Wang, Jin Xu, Deping Xiang, Zicheng Zhang, Leqi Shen, Yifeng Zhang, JunhuaShu JunhuaShu, ZhaolongXing ZhaolongXing, Zhen Chen, Pengzhang Liu, Ke Zhang. 23067-23077 [doi]
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse AttentionJingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo 0002, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao 0001, Yuqing Wang, Chong Ruan, Ming Zhang 0004, Wenfeng Liang, Wangding Zeng. 23078-23097 [doi]
- DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in RoboticsYayu Long, Kewei Chen, Long Jin, Mingsheng Shang. 23098-23141 [doi]
- MT-RAIG: Novel Benchmark and Evaluation Framework for Retrieval-Augmented Insight Generation over Multiple TablesKwangwook Seo, Donguk Kwon, Dongha Lee 0003. 23142-23172 [doi]
- Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuningChenxi Huang 0004, Shaotian Yan, Liang Xie 0003, Binbin Lin, Sinan Fan, Yue Xin, Deng Cai 0001, Chen Shen 0003, Jieping Ye. 23173-23195 [doi]
- Does the Emotional Understanding of LVLMs Vary Under High-Stress Environments and Across Different Demographic Attributes?Jaewook Lee 0010, Yeajin Jang, Oh-Woog Kwon, Harksoo Kim. 23196-23210 [doi]
- S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic ModelingSuman Adhya, Debarshi Kumar Sanyal. 23211-23225 [doi]
- Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional AttentionZhaoxin Feng, Jianfei Ma, Emmanuele Chersoni, Xiaojing Zhao, Xiaoyi Bao. 23226-23245 [doi]
- Tracing and Dissecting How LLMs Recall Factual Knowledge for Real World QuestionsYiqun Wang, Chaoqun Wan, Sile Hu, Yonggang Zhang 0003, Xiang Tian 0002, Yaowu Chen, Xu Shen 0001, Jieping Ye. 23246-23271 [doi]
- Employing Discourse Coherence Enhancement to Improve Cross-Document Event and Entity Coreference ResolutionXinyu Chen, Peifeng Li, Qiaoming Zhu. 23272-23286 [doi]
- Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context LearningShaobo Wang 0001, Xiangqi Jin, Ziming Wang, Jize Wang, Jiajun Zhang, Kaixin Li, Zichen Wen, Zhong Li, Conghui He, Xuming Hu, Linfeng Zhang 0001. 23287-23305 [doi]
- Synthesizing Post-Training Data for LLMs through Multi-Agent SimulationShuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Tian Jin, Xiaowen Dong 0001, Yanfeng Wang 0001, Siheng Chen. 23306-23335 [doi]
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMsYige Xu 0001, Xu Guo 0002, Zhiwei Zeng, Chunyan Miao. 23336-23351 [doi]
- FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningSeunghee Kim, Changhyeon Kim, Taeuk Kim. 23352-23380 [doi]
- Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target AtomsMengru Wang, Ziwen Xu, Shengyu Mao, Shumin Deng, Zhaopeng Tu, Huajun Chen, Ningyu Zhang 0001. 23381-23399 [doi]
- MobiLoRA: Accelerating LoRA-based LLM Inference on Mobile Devices via Context-aware KV Cache OptimizationBorui Li, Yitao Wang, Haoran Ma, Ligeng Chen, Jun Xiao, Shuai Wang. 23400-23410 [doi]
- Language Models Resist Alignment: Evidence From Data CompressionJiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen 0008, Jiayi Zhou, Changye Li 0003, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang 0001. 23411-23432 [doi]
- Beyond the Answer: Advancing Multi-Hop QA with Fine-Grained Graph Reasoning and EvaluationQichuan Liu, Chentao Zhang, Chenfeng Zheng, Guosheng Hu, Xiaodong Li, Zhihong Zhang. 23433-23456 [doi]
- Mamba Knockout for Unraveling Factual Information FlowNir Endy, Idan Daniel Grosbard, Yuval Ran-Milo, Yonatan Slutzky, Itay Tshuva, Raja Giryes. 23457-23477 [doi]
- Small Changes, Big Impact: How Manipulating a Few Neurons Can Drastically Alter LLM AggressionJaewook Lee 0010, Junseo Jang, Oh-Woog Kwon, Harksoo Kim. 23478-23505 [doi]
- Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning ModelsHuifeng Yin, Yu Zhao, Minghao Wu, Xuanfan Ni, Bo Zeng, Huaiyu. wh Huaiyu. wh, Tianqi Shi, Liangying Shao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang. 23506-23516 [doi]
- Curiosity-Driven Reinforcement Learning from Human FeedbackHaoran Sun, Yekun Chai, Shuohuan Wang, Yu Sun, Hua Wu 0003, Haifeng Wang 0001. 23517-23534 [doi]
- T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI FeedbackZehan Wang 0001, Ke Lei, Chen Zhu, Jiawei Huang 0008, Sashuai Zhou, Luping Liu, Xize Cheng, Shengpeng Ji, Zhenhui Ye, Tao Jin 0004, Zhou Zhao 0001. 23535-23547 [doi]
- CoE: A Clue of Emotion Framework for Emotion Recognition in ConversationsZhiyu Shen, Yunhe Pang 0001, Yanghui Rao, Jianxing Yu. 23548-23563 [doi]
- MPO: Multilingual Safety Alignment via Reward Gap OptimizationWeixiang Zhao, Yulin Hu, Yang Deng 0002, Tongtong Wu, Wenxuan Zhang 0001, Jiahe Guo, An Zhang 0003, Yanyan Zhao, Bing Qin 0001, Tat-Seng Chua, Ting Liu 0001. 23564-23587 [doi]
- QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and DescriptionsSiyin Wang, Wenyi Yu, Xianzhao Chen, Xiaohai Tian, Jun Zhang 0066, Lu Lu 0015, Yu Tsao 0001, Junichi Yamagishi, Yuxuan Wang 0002, Chao Zhang 0031. 23588-23609 [doi]
- On the Relation Between Fine-Tuning, Topological Properties, and Task Performance in Sense-Enhanced EmbeddingsDeniz Ekin Yavas, Timothée Bernard, Benoît Crabbé, Laura Kallmeyer. 23610-23625 [doi]
- Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?Parth Thakkar, Ankush Agarwal, Prasad Kasu, Pulkit Bansal, Chaitanya Devaguptapu. 23626-23648 [doi]
- Don't Half-listen: Capturing Key-part Information in Continual Instruction TuningYongquan He, Wenyuan Zhang, Xuancheng Huang, Peng Zhang, Lingxun Meng, Xiang Zhou, Ke Zeng, Xunliang Cai. 23649-23668 [doi]
- Generating Plausible Distractors for Multiple-Choice Questions via Student Choice PredictionYooseop Lee, Suin Kim, Yohan Jo. 23669-23692 [doi]
- Exploring Explanations Improves the Robustness of In-Context LearningUkyo Honda, Tatsushi Oka. 23693-23714 [doi]
- Prediction Hubs are Context-Informed Frequent Tokens in LLMsBeatrix Miranda Ginn Nielsen, Iuri Macocco, Marco Baroni. 23715-23745 [doi]
- Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling LawQiming Ge, Shuhao Xing, Songyang Gao, Yunhua Zhou, Yicheng Zou, Songyang Zhang, Zhi Chen 0006, Hang Yan 0001, Qi Zhang 0001, Qipeng Guo, Kai Chen 0026. 23746-23761 [doi]
- CRUXEVAL-X: A Benchmark for Multilingual Code Reasoning, Understanding and ExecutionRuiyang Xu, Jialun Cao, Yaojie Lu 0001, Ming Wen 0001, Hongyu Lin, Xianpei Han, Ben He, Shing-Chi Cheung, Le Sun 0001. 23762-23779 [doi]
- Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with GraphsHaozhen Zhang, Tao Feng, Jiaxuan You. 23780-23799 [doi]
- Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE datasetDiana Galván-Sosa, Gabrielle Gaudeau, Pride Kavumba, Yunmeng Li, Hongyi Gu, Zheng Yuan 0003, Keisuke Sakaguchi, Paula Buttery. 23800-23839 [doi]
- A Dual-Mind Framework for Strategic and Expressive Negotiation AgentYutong Liu, Lida Shi, Rui Song 0008, Hao Xu 0012. 23840-23860 [doi]
- Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsJunjie Wu, Gefei Gu, Yanan Zheng, Dit-Yan Yeung, Arman Cohan. 23861-23880 [doi]
- Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training StrategiesZhengyu Chen 0001, Siqi Wang, Teng Xiao, Yudong Wang, Shiqi Chen, Xunliang Cai, Junxian He, Jingang Wang. 23881-23899 [doi]
- Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not ArgumentsMarc Feger, Katarina Boland, Stefan Dietze. 23900-23915 [doi]
- Enhancing Machine Translation with Self-Supervised Preference DataHaoxiang Sun, Ruize Gao, Pei Zhang 0011, Baosong Yang, Rui Wang 0015. 23916-23934 [doi]
- Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document RetrievalHao Sun 0015, Yingyan Hou, Jiayan Guo, Bo Wang, Chunyu Yang, Jinsong Ni, Yan Zhang. 23935-23945 [doi]
- Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsAnte Wang, Linfeng Song, Ye Tian, Dian Yu 0001, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu 0001. 23946-23959 [doi]
- MEXMA: Token-level objectives improve sentence representationsJoão Maria Janeiro, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault. 23960-23995 [doi]
- Uncertainty-Aware Iterative Preference Optimization for Enhanced LLM ReasoningLei Li, Hehuan Liu, Yaxin Zhou, Zhaoyang Gui, Xudong Weng, Yi Yuan, Zheng Wei, Zang Li. 23996-24012 [doi]
- AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent CollaborationZhexuan Wang, Yutong Wang, Xuebo Liu 0002, Liang Ding 0006, Miao Zhang, Jie Liu 0001, Min Zhang 0005. 24013-24035 [doi]
- Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human StatesYang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu. 24036-24057 [doi]
- Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large LanguageBo Zeng, Chenyang Lyu, Sinuo Liu, Mingyan Zeng, Minghao Wu, Xuanfan Ni, Tianqi Shi, Yu Zhao, Yefeng Liu, Chenyu Zhu, Ruizhe Li, Jiahui Geng, Qing Li, Yu Tong, Longyue Wang, Weihua Luo, Kaifu Zhang. 24058-24072 [doi]
- Representation Bending for Large Language Model SafetyAshkan Yousefpour, Taeheon Kim, Ryan Sungmo Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han 0002, Alvin Wan, Harrison Ngan, Youngjae Yu, Jonghyun Choi. 24073-24098 [doi]
- Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal RepresentationsChenghao Xiao, Hou Pong Chan, Hao Zhang 0048, Mahani Aljunied, Lidong Bing, Noura Al Moubayed, Yu Rong. 24099-24115 [doi]
- Enhancing Retrieval-Augmented Generation via Evidence Tree SearchHao Sun, Hengyi Cai, Yuchen Li, Xuanbo Fan, Xiaochi Wei, Shuaiqiang Wang, Yan Zhang, Dawei Yin. 24116-24127 [doi]
- HalluLens: LLM Hallucination BenchmarkYejin Bang, Ziwei Ji 0001, Alan Schelten, Anthony Hartshorn, Tara Fowler, Cheng Zhang, Nicola Cancedda, Pascale Fung. 24128-24156 [doi]
- DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona ModelingAili Chen, Chengyu Du, Jiangjie Chen, Jinghan Xu, Yikai Zhang 0004, Siyu Yuan, Zulong Chen, Liangyue Li, Yanghua Xiao. 24157-24180 [doi]
- Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language ModelsJie Liu 0044, Wenxuan Wang 0001, Yihang Su, Jingyuan Huang, Yudi Zhang 0005, Cheng-Yi Li, Wenting Chen, Xiaohan Xing, Kao-Jung Chang, LinLin Shen, Michael R. Lyu. 24181-24201 [doi]
- InstructPart: Task-Oriented Part Segmentation with Instruction ReasoningZifu Wan, Yaqi Xie 0001, Ce Zhang 0009, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, Katia P. Sycara. 24202-24227 [doi]
- GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-counting Markov ModelThomas Bauwens, David Kaczér, Miryam de Lhoneux. 24228-24257 [doi]
- Evaluating the Evaluation of Diversity in Commonsense GenerationTianhui Zhang, Bei Peng, Danushka Bollegala. 24258-24275 [doi]
- Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced SamplingZhao Tong, Yimeng Gu, Huidong Liu, Qiang Liu, Shu Wu, Haichao Shi, Xiao-Yu Zhang. 24276-24290 [doi]
- ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated DataYu Zhang, Ruijie Yu, Jidong Tian, Feng Zhu 0006, Jiapeng Liu, Xiaokang Yang 0001, Yaohui Jin, Yanyan Xu 0002. 24291-24314 [doi]
- Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary PerceptionShiyu Ni, Keping Bi, Jiafeng Guo, Lulu Yu, Baolong Bi, Xueqi Cheng. 24315-24329 [doi]
- ALGEN: Few-shot Inversion Attacks on Textual Embeddings via Cross-Model Alignment and GenerationYiyi Chen 0002, Qiongkai Xu, Johannes Bjerva. 24330-24348 [doi]
- Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed ChainsKun Li 0003, Tianhua Zhang, Xixin Wu, Hongyin Luo, James R. Glass, Helen M. Meng. 24349-24364 [doi]
- STaR-SQL: Self-Taught Reasoner for Text-to-SQLMingqian He, Yongliang Shen 0001, Wenqi Zhang, Qiuying Peng, Jun Wang, Weiming Lu. 24365-24375 [doi]
- Fairness Beyond Performance: Revealing Reliability Disparities Across Groups in Legal NLPT. Y. S. S. Santosh, Irtiza Chowdhury. 24376-24390 [doi]
- Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data SelectionYang Zhao 0023, Li Du, Xiao Ding, Yangou Ouyang, Hepeng Wang, Kai Xiong 0002, Jinglong Gao, Zhouhao Sun, Dongliang Xu, Qing Yang 0033, Dongchen Li, Bing Qin 0001, Ting Liu 0001. 24391-24404 [doi]
- FastMCTS: A Simple Sampling Strategy for Data SynthesisPeiji Li, Kai Lv 0001, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo. 24405-24422 [doi]
- Dialogue-RAG: Enhancing Retrieval for LLMs via Node-Linking Utterance RewritingQiwei Li 0002, Teng Xiao, Zuchao Li, Ping Wang 0028, Mengjia Shen, Hai Zhao 0001. 24423-24438 [doi]
- Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-AccentEthan Wilcox, Cui Ding, Giovanni Acampa, Tiago Pimentel, Alex Warstadt, Tamar I. Regev. 24439-24451 [doi]
- Evaluating LLMs for Portuguese Sentence Simplification with Linguistic InsightsArthur Mariano Rocha De Azevedo Scalercio, Elvis A. De Souza, Maria José Bocorny Finatto, Aline Paes. 24452-24477 [doi]
- LaTIM: Measuring Latent Token-to-Token Interactions in Mamba ModelsHugo Pitorro, Marcos Vinícius Treviso. 24478-24493 [doi]
- Improving Low-Resource Morphological Inflection via Self-Supervised ObjectivesAdam Wiemerslage, Katharina von der Wense. 24494-24510 [doi]
- Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space TransformationYingchaojie Feng, Yiqun Sun, Yandong Sun, Minfeng Zhu 0001, Qiang Huang, Anthony Kum Hoe Tung, Wei Chen 0001. 24511-24525 [doi]
- BOOKCOREF: Coreference Resolution at Book ScaleGiuliano Martinelli 0001, Tommaso Bonomo, Pere-Lluís Huguet Cabot, Roberto Navigli. 24526-24544 [doi]
- OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal RetrievalWei Yang, Jingjing Fu, Rui Wang 0028, Jinyu Wang, Lei Song 0001, Jiang Bian 0002. 24545-24563 [doi]
- Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention LearningLei Huang 0021, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yuxuan Gu, Yangfan Ye, Liang Zhao, Weihong Zhong, Baoxin Wang, Dayong Wu, Guoping Hu, Lingpeng Kong, Tong Xiao, Ting Liu 0001, Bing Qin 0001. 24564-24579 [doi]
- Retrospective Learning from InteractionsZizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, Yoav Artzi. 24580-24606 [doi]
- Personalized Generation In Large Model Era: A SurveyYiyan Xu, Jinghao Zhang, Alireza Salemi, Xinting Hu, Wenjie Wang 0007, Fuli Feng, Hamed Zamani, Xiangnan He 0001, Tat-Seng Chua. 24607-24649 [doi]
- Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM ReasoningJunqi Gao, Xiang Zou, Ying Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu. 24650-24668 [doi]
- SOTOPIA-: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social AgentsWenyuan Zhang, Tianyun Liu, Mengxiao Song, Xiaodong Li, Tingwen Liu. 24669-24697 [doi]
- Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Shanchao Liang, Nan Jiang 0012, Yiran Hu, Lin Tan 0001. 24698-24717 [doi]
- Leveraging In-Context Learning for Political Bias Testing of LLMsPatrick Haller 0001, Jannis Vamvas, Rico Sennrich, Lena Ann Jäger. 24718-24738 [doi]
- ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract DraftingSteven H. Wang, Maksim Zubkov, Kexin Fan, Sarah Harrell, Yuyang Sun, Wei Chen, Andreas Plesner, Roger Wattenhofer. 24739-24762 [doi]
- LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution ShiftsQibing Ren, Hao Li 0069, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao 0001, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao. 24763-24785 [doi]
- WAFFLE: Fine-tuning Multi-Modal Model for Automated Front-End DevelopmentShanchao Liang, Nan Jiang 0012, Shangshu Qian, Lin Tan 0001. 24786-24802 [doi]
- Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward PassesBryan R. Christ, Zachary Gottesman, Jonathan Kropko, Thomas Hartvigsen. 24803-24840 [doi]
- Multiple LLM Agents Debate for Equitable Cultural AlignmentDayeon Ki, Rachel Rudinger, Tianyi Zhou 0001, Marine Carpuat. 24841-24877 [doi]
- RefreshKV: Updating Small KV Cache During Long-form GenerationFangyuan Xu, Tanya Goyal, Eunsol Choi. 24878-24893 [doi]
- SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic EmbeddingsWeikai Lu, Hao Peng, Huiping Zhuang, Cen Chen 0002, Ziqian Zeng. 24894-24913 [doi]
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm PerspectiveYiyao Yu, Yuxiang Zhang, Dongdong Zhang 0001, Xiao Liang, Hengyuan Zhang, Xingxing Zhang 0002, Mahmoud Khademi, Hany Hassan Awadalla, Junjie Wang, Yujiu Yang, Furu Wei. 24914-24937 [doi]
- Language Models Grow Less Humanlike beyond Phase TransitionTatsuya Aoyama, Ethan Wilcox. 24938-24958 [doi]
- PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media DisinformationArkadiusz Modzelewski, Witold Sosnowski, Tiziano Labruna, Adam Wierzbicki, Giovanni Da San Martino. 24959-24983 [doi]
- Coordinating Chaos: A Structured Review of Linguistic Coordination MethodologiesBenjamin Roger Litterer, David Jurgens, Dallas Card. 24984-24999 [doi]
- iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to NewsTiancheng Hu, Nigel Collier. 25000-25040 [doi]
- Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal GesturesAkhila Yerukola, Saadia Gabriel, Nanyun Peng 0001, Maarten Sap. 25041-25080 [doi]
- 500xCompressor: Generalized Prompt Compression for Large Language ModelsZongqian Li, Yixuan Su, Nigel Collier. 25081-25091 [doi]
- Estimating Privacy Leakage of Augmented Contextual Knowledge in Language ModelsJames Flemings, Bo Jiang, Wanrong Zhang 0003, Zafar Takhirov, Murali Annavaram. 25092-25108 [doi]
- Document-Level Event-Argument Data Augmentation for Challenging Role TypesJoseph Gatto, Omar Sharif, Parker Seegmiller, Sarah Masud Preum. 25109-25131 [doi]
- Mapping the Podcast Ecosystem with the Structured Podcast Research CorpusBenjamin Roger Litterer, David Jurgens, Dallas Card. 25132-25154 [doi]
- Unravelling the Logic: Investigating the Generalisation of Transformers in Numerical Satisfiability ProblemsTharindu Madusanka, Marco Valentino, Iqra Zahid, Ian Pratt-Hartmann, Riza Batista-Navarro. 25155-25168 [doi]
- The Nature of NLP: Analyzing Contributions in NLP PapersAniket Pramanick, Yufang Hou 0001, Saif M. Mohammad, Iryna Gurevych. 25169-25191 [doi]
- \mathttGeLLM³O: Generalizing Large Language Models for Multi-property Molecule OptimizationVishal Dey, Xiao Hu, Xia Ning. 25192-25221 [doi]
- Follow-up Question Generation For Enhanced Patient-Provider ConversationsJoseph Gatto, Parker Seegmiller, Timothy E. Burdick, Inas S. Khayal, Sarah Delozier, Sarah Masud Preum. 25222-25240 [doi]
- Unveiling Privacy Risks in LLM Agent MemoryBo Wang 0069, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing 0002, Jiliang Tang, Pengfei He. 25241-25260 [doi]
- Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality EstimationEmmanouil Zaranis, Giuseppe Attanasio, Sweta Agrawal, André F. T. Martins. 25261-25284 [doi]
- Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal SummarizationNayu Liu, Fanglong Yao, Haoran Luo, Yong Yang 0001, Chen Tang, Bo Lv. 25285-25298 [doi]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward ModelsMingYang Song, Zhaochen Su, Xiaoye Qu, Jiawei Zhou, Yu Cheng 0001. 25299-25346 [doi]
- Efficient Ensemble for Fine-tuning Language Models on Multiple DatasetsDongyue Li, Ziniu Zhang, Lu Wang 0008, Hongyang R. Zhang. 25347-25364 [doi]
- Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal CyclesMunachiso Nwadike, Zangir Iklassov, Toluwani Aremu, Tatsuya Hiraoka, Benjamin Heinzerling, Velibor Bojkovic, Hilal AlQuabeh, Martin Takác 0001, Kentaro Inui. 25365-25377 [doi]
- Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language ModelsLang Gao, Jiahui Geng, Xiangliang Zhang 0001, Preslav Nakov, Xiuying Chen. 25378-25398 [doi]
- ASPERA: A Simulated Environment to Evaluate Planning for Complex Action ExecutionAlexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng 0001, Bo-Hsiang Tseng, Peter Boothroyd, Héctor Martínez Alonso, Diarmuid Ó Séaghdha, Anders Johannsen. 25399-25434 [doi]
- ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion FrameworkJiahao Yuan, Zixiang Di, Zhiqing Cui, Guisong Yang, Usman Naseem. 25435-25449 [doi]
- SARA: Salience-Aware Reinforced Adaptive Decoding for Large Language Models in Abstractive SummarizationNayu Liu, Junnan Zhu, Yiming Ma, Zhicong Lu, Wenlei Xu, Yong Yang, Jiang Zhong, Kaiwen Wei. 25450-25463 [doi]
- Embedding-Converter: A Unified Framework for Cross-Model Embedding TransformationJinsung Yoon, Sercan Ö. Arik. 25464-25482 [doi]
- Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction via LLMs-as-the-JudgeMd. Tahmid Rahman Laskar, Israt Jahan, Elham Dolatabadi, Chun Peng, Enamul Hoque, Jimmy Huang 0001. 25483-25497 [doi]
- Answering Complex Geographic Questions by Adaptive Reasoning with Visual Context and External Commonsense KnowledgeFan Li, Jianxing Yu, Jielong Tang, Wenqing Chen, Hanjiang Lai, Yanghui Rao, Jian Yin 0001. 25498-25514 [doi]
- Safety Alignment via Constrained Knowledge UnlearningZesheng Shi, Yucheng Zhou 0001, Jing Li 0034, Yuxin Jin, Yu Li 0007, Daojing He, Fangming Liu, Saleh Alharbi, Jun Yu, Min Zhang 0005. 25515-25529 [doi]
- Response Wide Shut? Surprising Observations in Basic Vision Language Model CapabilitiesShivam Chandhok, Wan-Cyuan Fan, Vered Shwartz, Vineeth N. Balasubramanian, Leonid Sigal. 25530-25545 [doi]
- EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language ModelsZekun Wang 0001, Minghua Ma, Zexin Wang, Rongchuan Mu, Liping Shan, Ming Liu 0004, Bing Qin 0001. 25546-25572 [doi]
- Pre-Training Curriculum for Multi-Token Prediction in Language ModelsAnsar Aynetdinov, Alan Akbik. 25573-25588 [doi]
- Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging TasksXingxuan Li, Weiwen Xu, Ruochen Zhao, Fangkai Jiao, Shafiq Joty, Lidong Bing. 25589-25604 [doi]
- On Many-Shot In-Context Learning for Long-Context EvaluationKaijian Zou, Muhammad Khalifa, Lu Wang 0008. 25605-25639 [doi]
- HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain TasksZhilin Wang, Jiaqi Zeng, Olivier Delalleau, Daniel Egert, Ellie Evans, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev. 25640-25662 [doi]
- CulturalBench: A Robust, Diverse and Challenging Benchmark for Measuring LMs' Cultural Knowledge Through Human-AI Red-TeamingYu-Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, Yejin Choi 0001. 25663-25701 [doi]
- Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based FinetuningMohit Raghavendra, Junmo Kang, Alan Ritter. 25702-25720 [doi]
- All That Glitters is Not Novel: Plagiarism in AI Generated ResearchTarun Gupta, Danish Pruthi. 25721-25738 [doi]
- Writing Like the Best: Exemplar-Based Expository Text GenerationYuxiang Liu, Kevin Chen-Chuan Chang. 25739-25764 [doi]
- Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer ApproachRochana Chaturvedi, Peyman Baghershahi, Sourav Medya, Barbara Di Eugenio. 25765-25788 [doi]
- Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for ChatbotsSarah E. Finch, Ellie S. Paek, Ikseon Choi, Jinho D. Choi. 25789-25806 [doi]
- Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer's Disease DetectionChuyuan Li, Raymond Li, Thalia Shoshana Field, Giuseppe Carenini. 25807-25826 [doi]
- Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing FeedbackHannah Rashkin, Elizabeth Clark, Fantine Huot, Mirella Lapata. 25827-25847 [doi]
- Language Fusion for Parameter-Efficient Cross-lingual TransferPhilipp Borchert, Ivan Vulic, Marie-Francine Moens, Jochen De Weerdt. 25848-25868 [doi]
- Culture is Not Trivia: Sociocultural Theory for Cultural NLPNaitian Zhou, David Bamman, Isaac L. Bleaman. 25869-25886 [doi]
- AAD-LLM: Neural Attention-Driven Auditory Scene UnderstandingXilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh D. Mehta, Guy M. McKhann II, Daniel Friedman, Adeen Flinker, Nima Mesgarani. 25887-25909 [doi]
- Do Language Models Have Semantics? On the Five Standard PositionsAnders Søgaard. 25910-25922 [doi]
- Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation SystemsMyra Cheng, Su Lin Blodgett, Alicia DeVrio, Lisa Egede, Alexandra Olteanu. 25923-25948 [doi]
- Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired UsersAntonia Karamolegkou, Malvina Nikandrou, Georgios Pantazopoulos, Danae Sanchez Villegas, Phillip Rust, Ruchira Dhar, Daniel Hershcovich, Anders Søgaard. 25949-25982 [doi]
- HumT DumT: Measuring and controlling human-like language in LLMsMyra Cheng, Sunny Yu, Dan Jurafsky. 25983-26008 [doi]
- ChatBench: From Static Benchmarks to Human-AI EvaluationSerina Chang, Ashton Anderson, Jake M. Hofman. 26009-26038 [doi]
- Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled PreferencesMohammad Saqib Hasan, Saikat Chakraborty, Santu Karmaker, Niranjan Balasubramanian. 26039-26057 [doi]
- Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMsXiulin Yang, Tatsuya Aoyama, Yuekun Yao, Ethan Wilcox. 26058-26077 [doi]
- Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI CombatRoland Daynauth, Christopher Clarke, Krisztián Flautner, Lingjia Tang, Jason Mars. 26078-26091 [doi]
- LLM Agents Making Agent ToolsGeorg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelovic, Jakob Nikolas Kather. 26092-26130 [doi]
- CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended WorldZoya Volovikova, Gregory Gorbov, Petr Kuderov, Aleksandr Panov, Alexey Skrynnik. 26131-26151 [doi]
- QG-SMS: Enhancing Test Item Analysis via Student Modeling and SimulationBang Nguyen, Tingting Du, Mengxia Yu, Lawrence Angrave, Meng Jiang 0001. 26152-26168 [doi]
- Causal Graph based Event Reasoning using Semantic Relation ExpertsMahnaz Koupaee, Xueying Bai, Mudan Chen, Greg Durrett, Nathanael Chambers, Niranjan Balasubramanian. 26169-26199 [doi]
- LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningJin Jiang, Yuchen Yan, Yang Liu, Jianing Wang, Shuai Peng, Xunliang Cai, Yixin Cao, Mengdi Zhang, Liangcai Gao. 26200-26218 [doi]
- Do LLMs Understand Dialogues? A Case Study on Dialogue ActsAyesha Qamar, Jonathan Tong, Ruihong Huang. 26219-26237 [doi]
- Research Borderlands: Analysing Writing Across Research CulturesShaily Bhatt, Tal August, Maria Antoniak. 26238-26266 [doi]
- CEAES: Bidirectional Reinforcement Learning Optimization for Consistent and Explainable Essay AssessmentXia Li, Wenjing Pan. 26267-26279 [doi]
- DeAL: Decoding-time Alignment for Large Language ModelsJames Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-An Lai, Arshit Gupta, Nikolaos Pappas 0004, Saab Mansour, Katrin Kirchhoff, Dan Roth. 26280-26300 [doi]
- Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal MetaphorsSenqi Yang, Dongyu Zhang 0001, Jing Ren 0001, Ziqi Xu 0001, Xiuzhen Zhang 0001, Yiliao Song, Hongfei Lin, Feng Xia 0001. 26301-26317 [doi]
- OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality InteractionHaonan Zhang, Run Luo, Xiong Liu, Yuchuan Wu, Ting-En Lin, Pengpeng Zeng, Qiang Qu, Feiteng Fang, Min Yang 0007, Lianli Gao, Jingkuan Song, Fei Huang 0002, Yongbin Li. 26318-26331 [doi]
- Mixtures of In-Context LearnersGiwon Hong, Emile van Krieken, Edoardo Maria Ponti, Nikolay Malkin, Pasquale Minervini. 26332-26351 [doi]
- Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text GenerationYuxuan Zhou 0004, Margret Keuper, Mario Fritz. 26352-26365 [doi]
- RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge InjectionWenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li 0010, Yan Hu, Wenjie Li, Jiang Liu 0001. 26366-26381 [doi]
- Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text UpdatesJaewoo Ahn, Heeseung Yun, Dayoon Ko, Gunhee Kim. 26382-26402 [doi]
- Attention Speaks Volumes: Localizing and Mitigating Bias in Language ModelsRishabh Adiga, Besmira Nushi, Varun Chandrasekaran. 26403-26423 [doi]
- MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teamingWeiyang Guo, Jing Li 0034, Wenya Wang, Yu Li 0007, Daojing He, Jun Yu, Min Zhang 0005. 26424-26442 [doi]
- The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early ExitHuixue Zhou, Hengrui Gu 0002, Zaifu Zhan, Xi Liu, Kaixiong Zhou, Yongkang Xiao, Mingfu Liang, Srinivas Prasad Govindan, Piyush Chawla, Jiyan Yang, Xiangfei Meng, Huayu Li, Buyun Zhang, Liang Luo, Wen-Yen Chen, Yiping Han, Bo Long, Rui Zhang 0028, Tianlong Chen 0001. 26443-26458 [doi]
- Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model MergingHaobo Zhang 0002, Jiayu Zhou. 26459-26472 [doi]
- BIG-Bench Extra HardMehran Kazemi, Bahare Fatemi, Hritik Bansal, John Palowitch, Chrysovalantis Anastasiou, Sanket Vaibhav Mehta, Lalit K. Jain, Virginia Aglietti, Disha Jindal, Peter Chen, Nishanth Dikkala, Gladys Tyen, Xin Liu 0034, Uri Shalit, Silvia Chiappa, Kate Olszewska, Yi Tay, Vinh Q. Tran 0002, Quoc V. Le, Orhan Firat. 26473-26501 [doi]
- CSTree-SRI: Introspection-Driven Cognitive Semantic Tree for Multi-Turn Question Answering over Extra-Long ContextsZhaowen Wang, Xiang Wei, Kangshao Du, Yiting Zhang, Libo Qin, Yingjie Xia, Li Kuang. 26502-26525 [doi]
- InductionBench: LLMs Fail in the Simplest Complexity ClassWenyue Hua, Tyler Wong, Fei Sun, Liangming Pan, Adam Jardine, William Yang Wang. 26526-26546 [doi]
- RATIONALYST: Pre-training Process-Supervision for Improving ReasoningDongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme, Daniel Khashabi. 26547-26566 [doi]
- Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine TranslationAndong Chen, Yuchen Song, Kehai Chen, Xuefeng Bai 0001, Muyun Yang, Liqiang Nie, Jie Liu 0001, Tiejun Zhao, Min Zhang 0005. 26567-26583 [doi]
- Advancing SMoE for Continuous Domain Adaptation of MLLMs: Adaptive Router and Domain-Specific LossLiang Zhang, Ziyao Lu, Fandong Meng, Hui Li, Jie Zhou, Jinsong Su. 26584-26602 [doi]
- Multi-document Summarization through Multi-document Event Relation Graph Reasoning in LLMs: a case study in Framing Bias MitigationYuanyuan Lei 0001, Ruihong Huang. 26603-26619 [doi]
- Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text DetectionJiatao Li, Xiaojun Wan 0001. 26620-26658 [doi]
- RoCoFT: Efficient Finetuning of Large Language Models with Row-Column UpdatesMd. Kowsher, Tara Esmaeilbeig, Chun-Nam Yu, Chen Chen, Mojtaba Soltanalian, Niloofar Yousefi 0001. 26659-26678 [doi]
- Scaling Laws and Efficient Inference for Ternary Language ModelsTejas Vaidhya, Ayush Kaushal, Vineet Jain, Francis Couture Harpin, Prashant Shishodia, Majid Behbahani, Yuriy Nevmyvaka, Irina Rish. 26679-26710 [doi]
- Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to MisinformationKyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim. 26711-26731 [doi]
- Do Language Models Understand Honorific Systems in Javanese?Mohammad Rifqi Farhansyah, Iwan Darmawan, Adryan Kusumawardhana, Genta Indra Winata, Alham Fikri Aji, Derry Tanti Wijaya. 26732-26754 [doi]
- Generative Reward Modeling via Synthetic Criteria Preference LearningXiaobo Liang, Haoke Zhang, Juntao Li, Kehai Chen, Qiaoming Zhu, Min Zhang 0005. 26755-26769 [doi]
- Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task LearningXinyu Zhang, Aibo Song, Jingyi Qiu, Jiahui Jin, Tianbo Zhang, Xiaolin Fang 0001. 26770-26781 [doi]
- A Self-Denoising Model for Robust Few-Shot Relation ExtractionLiang Zhang, Yang Zhang, Ziyao Lu, Fandong Meng, Jie Zhou, Jinsong Su. 26782-26797 [doi]
- QuASAR: A Question-Driven Structure-Aware Approach for Table-to-Text GenerationWeijie Liu, Yibin Zheng, Fang Kong. 26798-26812 [doi]
- Automated Structured Radiology Report GenerationJean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, Eduardo Pontes Reis, Mohamed S. Muneer, Maya Varma, Curtis P. Langlotz. 26813-26829 [doi]
- LPOI: Listwise Preference Optimization for Vision Language ModelsFatemeh Pesaran zadeh, Yoojin Oh, Gunhee Kim. 26830-26844 [doi]
- Predicting Through Generation: Why Generation Is Better for PredictionMd. Kowsher, Nusrat Jahan Prottasha, Prakash Bhat, Chun-Nam Yu, Mojtaba Soltanalian, Ivan Garibay, Ozlem O. Garibay, Chen Chen, Niloofar Yousefi 0001. 26845-26871 [doi]
- "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM QuantizationEldar Kurtic, Alexandre Noll Marques, Shubhra Pandit, Mark Kurtz, Dan Alistarh. 26872-26886 [doi]
- StitchLLM: Serving LLMs, One Block at a TimeBodun Hu, Shuozhe Li, Saurabh Agarwal, Myungjin Lee, Akshay Jajoo, Jiamin Li 0002, Le Xu, Geon Woo Kim, Donghyun Kim 0002, Hong Xu 0001, Amy Zhang 0001, Aditya Akella. 26887-26903 [doi]
- Walk in Others' Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective TransformationYuqi Bu, Xin Wu, Zirui Zhao, Yi Cai 0001, David Hsu, Qiong Liu 0006. 26904-26923 [doi]
- Is linguistically-motivated data augmentation worth it?Ray Groshan, Michael Ginn, Alexis Palmer. 26924-26939 [doi]
- From Lists to Emojis: How Format Bias Affects Model AlignmentXuanchang Zhang, Wei Xiong 0015, Lichang Chen, Tianyi Zhou 0001, Heng Huang, Tong Zhang 0001. 26940-26961 [doi]
- Colloquial Singaporean English Style Transfer with Fine-Grained Explainable ControlJinggui Liang, Dung Vo, Yap Hong Xian, Hai Leong Chieu, Kian Ming Adam Chai, Jing Jiang 0001, Lizi Liao. 26962-26983 [doi]
- From Informal to Formal - Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal ProofsJialun Cao, Yaojie Lu 0001, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen 0002, Le Sun 0001, Hongyu Zhang 0002, Shengchao Qin, Shing-Chi Cheung, Cong Tian. 26984-27003 [doi]
- CoAM: Corpus of All-Type Multiword ExpressionsYusuke Ide, Joshua Tanner, Adam Nohejl, Jacob Hoffman, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe. 27004-27021 [doi]
- SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented GenerationZijun Yao 0002, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou 0001, Juanzi Li. 27022-27043 [doi]
- Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical ReasoningJoykirat Singh, Akshay Uttama Nambi, Vibhav Vineet. 27044-27065 [doi]
- Understanding the Dark Side of LLMs' Intrinsic Self-CorrectionQingjie Zhang, Di Wang, Haoting Qian, Yiming Li 0004, Tianwei Zhang 0004, Minlie Huang, Ke Xu 0002, Hewu Li, Liu Yan, Han Qiu 0001. 27066-27101 [doi]
- VideoVista-CulturalLingo: 360° Horizons-Bridging Cultures, Languages, and Domains in Video ComprehensionXinyu Chen 0003, Yunxin Li, Haoyuan Shi, Baotian Hu, Wenhan Luo, Yaowei Wang 0001, Min Zhang 0005. 27102-27128 [doi]
- What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best PracticesZhi Chen 0006, Qiguang Chen, Libo Qin 0001, Qipeng Guo, Haijun Lv, Yicheng Zou, Hang Yan 0001, Kai Chen 0026, Dahua Lin. 27129-27151 [doi]
- Knowledge Graph Retrieval-Augmented Generation for LLM-based RecommendationShijie Wang 0002, Wenqi Fan, Yue Feng 0002, Shanru Lin, Xinyu Ma, Shuaiqiang Wang, Dawei Yin. 27152-27168 [doi]
- SudoLM: Learning Access Control of Parametric Knowledge with Authorization AlignmentQin Liu 0010, Fei Wang 0060, Chaowei Xiao, Muhao Chen 0001. 27169-27181 [doi]
- I0T: Embedding Standardization Method Towards Zero Modality GapNa Min An, Eunki Kim, James Thorne, Hyunjung Shim. 27182-27199 [doi]
- Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text GenerationWen Luo 0001, Feifan Song 0001, Wei Li 0101, Guangyue Peng, Shaohang Wei, Houfeng Wang. 27200-27218 [doi]
- Better Embeddings with Coupled AdamFelix Stollenwerk, Tobias Stollenwerk. 27219-27236 [doi]
- Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective GenerationGuofu Xie, Xiao Zhang, Ting Yao, Yunsheng Shi. 27237-27263 [doi]
- Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie WorksheetsHarshit Joshi, Shicheng Liu, James Chen, Larsen Weigle, Monica S. Lam. 27264-27308 [doi]
- Benchmarking Long-Context Language Models on Long Code UnderstandingJia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Yuqi Zhu, Zhi Jin. 27309-27327 [doi]
- MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling CapabilitiesSavya Khosla, Aditi Tiwari, Kushal Kafle, Simon Jenni, Handong Zhao, John P. Collomosse, Jing Shi 0005. 27328-27346 [doi]
- Internal Value Alignment in Large Language Models through Controlled Value Vector ActivationHaoran Jin, Meng Li, Xiting Wang, Zhihao Xu 0003, Minlie Huang, Yantao Jia, Defu Lian. 27347-27371 [doi]
- A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better InterpretabilityXinyu Hu 0001, Mingqi Gao 0002, Li Lin 0014, Zhenghan Yu, Xiaojun Wan 0001. 27372-27395 [doi]
- Recurrent Knowledge Identification and Fusion for Language Model Continual LearningYujie Feng, Xujia Wang, Zexin Lu, Shenghong Fu, Guangyuan Shi, Yongxin Xu, Yasha Wang, Philip S. Yu, Xu Chu, Xiao-Ming Wu 0003. 27396-27413 [doi]
- Data-Constrained Synthesis of Training Data for De-IdentificationThomas Vakili, Aron Henriksson, Hercules Dalianis. 27414-27427 [doi]
- Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji InterpretationSoumitra Ghosh, Gopendra Vikram Singh, Shambhavi, Sabarna Choudhury, Asif Ekbal. 27428-27445 [doi]
- Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency ParsingPeiming Guo, Meishan Zhang, Jianling Li, Min Zhang 0005, Yue Zhang 0004. 27446-27458 [doi]
- MMDEND: Dendrite-Inspired Multi-Branch Multi-Compartment Parallel Spiking Neuron for Sequence ModelingKexin Wang, Yuhong Chou, Di Shang, Shijie Mei, Jiahong Zhang, Yanbin Huang, Man Yao, Bo Xu 0002, Guoqi Li. 27459-27470 [doi]
- Understanding Impact of Human Feedback via Influence FunctionsTaywon Min, Haeone Lee, Yongchan Kwon, Kimin Lee. 27471-27500 [doi]
- T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive ConceptsZiwei Huang, Wanggui He, Quanyu Long, Yandi Wang, Haoyuan Li, Zhelun Yu, Fangxun Shu, Weilong Dai, Hao Jiang, Fei Wu 0001, Leilei Gan. 27501-27524 [doi]
- InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for DebatingFuyu Wang, Jiangtong Li, Kun Zhu 0036, Changjun Jiang. 27525-27544 [doi]
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and OptimizationHongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu 0002, Hongming Zhang 0009, Tianqing Fang, Zhenzhong Lan, Dong Yu 0001. 27545-27564 [doi]
- FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification ReasoningKankan Zhou, Eason Lai, Kyriakos Mouratidis, Jing Jiang. 27565-27584 [doi]
- Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram DescriptionsWan Ju Kang, Eunki Kim, Na Min An, SangRyul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne. 27585-27621 [doi]
- Personal Travel Solver: A Preference-Driven LLM-Solver System for Travel PlanningZijian Shao, Jiancan Wu, Weijian Chen 0001, Xiang Wang 0010. 27622-27642 [doi]
- Counterspeech the ultimate shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix LearningAswini Kumar Padhi, Anil Bandhakavi, Tanmoy Chakraborty 0002. 27643-27663 [doi]
- LLM×MapReduce: Simplified Long-Sequence Processing using Large Language ModelsZihan Zhou, Chong Li, Xinyi Chen, Shuo Wang 0013, Yu Chao, Zhili Li, Haoyu Wang, Qi Shi 0002, Zhixing Tan, Xu Han 0007, Xiaodong Shi, Zhiyuan Liu 0001, Maosong Sun 0001. 27664-27678 [doi]
- CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedbackDennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis P. Langlotz, Akshay S. Chaudhari. 27679-27702 [doi]
- Knowledge Tracing in Programming Education Integrating Students' QuestionsDoyoun Kim, Suin Kim, Yohan Jo. 27703-27718 [doi]
- PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-EncoderYiqun Sun, Qiang Huang, Anthony Kum Hoe Tung, Jun Yu. 27719-27733 [doi]
- Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and AttitudesMeng Li, Michael Vrazitulis, David Schlangen. 27734-27757 [doi]
- Lexical Diversity-aware Relevance Assessment for Retrieval-Augmented GenerationZhange Zhang, Yuqing Ma, Yulong Wang, Shan He, Tianbo Wang, Siqi He, Jiakai Wang, Xianglong Liu 0001. 27758-27781 [doi]
- Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual ChainsJuntian Zhang, Chuanqi Cheng, Yuhan Liu, Wei Liu, Jian Luan, Rui Yan. 27782-27798 [doi]
- Online Iterative Self-Alignment for Radiology Report GenerationTing Xiao, Lei Shi 0004, Yang Zhang, Haofeng Yang, Zhe Wang 0008, Chenjia Bai. 27799-27814 [doi]
- Chinese Inertial GAN for Handwriting Signal Generation and RecognitionYifeng Wang, Yi Zhao. 27815-27832 [doi]
- LLMs Caught in the Crossfire: Malware Requests and Jailbreak ChallengesHaoyang Li, Huan Gao, Zhiyuan Zhao 0005, Zhiyu Lin, Junyu Gao 0001, Xuelong Li 0001. 27833-27848 [doi]
- Evaluating Sequence Labeling on the basis of Information TheoryEnrique Amigó, Elena Álvarez Mellado, Julio Gonzalo, Jorge Carrillo-de-Albornoz. 27849-27860 [doi]
- GRAT: Guiding Retrieval-Augmented Reasoning through Process Rewards Tree SearchXianshu Peng, Wei Wei 0002. 27861-27875 [doi]
- T-REG: Preference Optimization with Token-Level Reward RegularizationWenxuan Zhou, Shujian Zhang, Lingxiao Zhao, Tao Meng. 27876-27889 [doi]
- Gödel Agent: A Self-Referential Agent Framework for Recursively Self-ImprovementXunjian Yin, Xinyi Wang 0003, Liangming Pan, Li Lin 0014, Xiaojun Wan 0001, William Yang Wang. 27890-27913 [doi]
- AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse EnvironmentsZhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He 0024, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang 0001, Xipeng Qiu, Xuanjing Huang 0001, Zuxuan Wu, Yu-Gang Jiang 0001. 27914-27961 [doi]
- Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability TheoryYexiang Liu, Zekun Li 0007, Zhi Fang, Nan Xu, Ran He 0001, Tieniu Tan. 27962-27994 [doi]
- Information Locality as an Inductive Bias for Neural Language ModelsTaiga Someya, Anej Svete, Brian DuSell, Timothy J. O'Donnell, Mario Giulianelli, Ryan Cotterell. 27995-28013 [doi]
- Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language ModelsAdrián Bazaga, Rexhina Blloshmi, Bill Byrne, Adrià de Gispert. 28014-28033 [doi]
- Query-driven Document-level Scientific Evidence Extraction from Biomedical StudiesMassimiliano Pronesti, Joao H. Bettencourt-Silva, Paul Flanagan, Alessandra Pascale, Oisin Redmond, Anya Belz, Yufang Hou 0001. 28034-28051 [doi]
- Towards Robust Universal Information Extraction: Dataset, Evaluation, and SolutionJizhao Zhu, Akang Shi, Zixuan Li 0001, Long Bai 0002, Xiaolong Jin 0001, Jiafeng Guo, Xueqi Cheng. 28052-28070 [doi]
- Multi-perspective Alignment for Increasing Naturalness in Neural Machine TranslationHuiyuan Lai, Esther Ploeger, Rik van Noord, Antonio Toral. 28071-28084 [doi]
- Temporal reasoning for timeline summarisation in social mediaJiayu Song, Mahmud Elahi Akhter, Dana Atzil-Slonim, Maria Liakata. 28085-28101 [doi]
- Beyond Negative Stereotypes - Non-Negative Abusive Utterances about Identity Groups and Their Semantic VariantsTina Lommel, Elisabeth Eder, Josef Ruppenhofer, Michael Wiegand. 28102-28120 [doi]
- Persistent Homology of Topic Networks for the Prediction of Reader CuriosityManuel D. S. Hopp, Vincent Labatut, Arthur Amalvy, Richard Dufour, Hannah Stone, Hayley K. Jach, Kou Murayama. 28121-28132 [doi]
- Tokenisation is NP-CompletePhilip Whittington, Gregor Bachmann, Tiago Pimentel. 28133-28153 [doi]
- Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum LearningAndrei Mircea, Supriyo Chakraborty, Nima Chitsazan, Irina Rish, Ekaterina Lobacheva. 28154-28188 [doi]
- Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission PathsSonglin Zhai, Yuan Meng, Yuxin Zhang, Guilin Qi. 28189-28200 [doi]
- Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent SystemHaoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li 0037, Wanli Ouyang, Philip Torr 0001, Bowen Zhou 0002, Nanqing Dong. 28201-28240 [doi]
- Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal ThinkingYilong Chen, Junyuan Shang, Zhenyu Zhang 0006, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun 0029, Hua Wu 0003, Haifeng Wang 0001. 28241-28259 [doi]
- Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal TransportYuu Jinnai. 28260-28279 [doi]
- Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal TransportMinseok Choi, Daniel Rim, Dohyun Lee, Jaegul Choo. 28280-28297 [doi]
- Mixture of Small and Large Models for Chinese Spelling CheckZiheng Qiao, Houquan Zhou 0001, Zhenghua Li. 28298-28311 [doi]
- DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling CheckZiheng Qiao, Houquan Zhou 0001, Yumeng Liu, Zhenghua Li, Min Zhang 0005, Bo Zhang 0071, Chen Li 0001, Ji Zhang 0011, Fei Huang 0002. 28312-28324 [doi]
- Causal Estimation of Tokenisation BiasPietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos 0001, Tiago Pimentel. 28325-28340 [doi]
- Value Residual LearningZhanchao Zhou, Tianyi Wu, Zhiyun Jiang, Fares Obeid, Zhenzhong Lan. 28341-28356 [doi]
- SGIC: A Self-Guided Iterative Calibration Framework for RAGGuanhua Chen 0006, Yutong Yao, Lidia S. Chao, Xuebo Liu 0002, Derek F. Wong. 28357-28370 [doi]
- NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous ScriptsMuhammad Farid Adilazuarda, Musa Izzanardi Wijanarko, Lucky Susanto, Khumaisa Nur'Aini, Derry Tanti Wijaya, Alham Fikri Aji. 28371-28401 [doi]
- LLM-based Rumor Detection via Influence Guided Sample Selection and Game-based Perspective AnalysisZhiliang Tian, Jingyuan Huang, Zejiang He, Zhen Huang 0006, Menglong Lu, Linbo Qiao, Songzhu Mei, Yijie Wang 0001, Dongsheng Li. 28402-28414 [doi]
- Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual LearningZiqi Jia, Anmin Wang, Xiaoyang Qu, Xiaowen Yang, Jianzong Wang. 28415-28427 [doi]
- SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep LayersZicong Tang, Luohe Shi, Zuchao Li, Baoyuan Qi, Liu Guoming, Lefei Zhang, Ping Wang 0028. 28428-28442 [doi]
- Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented GenerationJunDe Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Min Xu, Filippo Menolascina, Yueming Jin, Vicente Grau. 28443-28467 [doi]
- Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language ModelsSeungcheol Park, Jeongin Bae, Beomseok Kwon, Minjun Kim 0010, Byeongwook Kim, Se Jung Kwon, U Kang, Dongsoo Lee. 28468-28488 [doi]
- Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic ToolsJunDe Wu, Jiayuan Zhu, Yuyuan Liu, Min Xu, Yueming Jin. 28489-28503 [doi]
- Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity AlignmentChenxiao Li, Jingwei Cheng, Qiang Tong 0003, Fu Zhang 0001, Cairui Wang. 28504-28516 [doi]
- Learn to Memorize: Scalable Continual Learning in Semiparametric Models with Mixture-of-Neighbors Induction MemoryGuangyue Peng, Tao Ge 0001, Wen Luo 0001, Wei Li 0101, Houfeng Wang. 28517-28531 [doi]
- Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial FindingsImane Guellil, Salomé Andres, Atul Anand, Bruce Guthrie, Huayu Zhang, Abul Hasan, Honghan Wu, Beatrice Alex. 28532-28562 [doi]
- Speed Up Your Code: Progressive Code Acceleration Through Bidirectional Tree EditingLonghui Zhang, Jiahao Wang, Meishan Zhang, GaoXiong Cao, Ensheng Shi, Mayuchi Mayuchi, Jun Yu, Honghai Liu 0001, Jing Li, Min Zhang 0005. 28563-28576 [doi]
- Multi-Facet Blending for Faceted Query-by-Example RetrievalHeejin Do, Sangwon Ryu, Jonghwi Kim, Gary Lee 0001. 28577-28590 [doi]
- PIPER: Benchmarking and Prompting Event Reasoning Boundary of LLMs via Debiasing-Distillation Enhanced TuningZhicong Lu, Changyuan Tian, PeiguangLi PeiguangLi, Li Jin 0001, Sirui Wang, Wei Jia, Ying Shen, Guangluan Xu. 28591-28613 [doi]
- MIR: Methodology Inspiration Retrieval for Scientific Research ProblemsAniketh Garikaparthi, Manasi Patwardhan 0001, Aditya Sanjiv Kanade, Aman Hassan, Lovekesh Vig, Arman Cohan. 28614-28659 [doi]
- Sticking to the Mean: Detecting Sticky Tokens in Text Embedding ModelsKexin Chen, Dongxia Wang, Yi Liu, Haonan Zhang, Wenhai Wang. 28660-28681 [doi]
- Memorizing is Not Enough: Deep Knowledge Injection Through ReasoningRuoxi Xu, Yunjie Ji, Boxi Cao, Yaojie Lu 0001, Hongyu Lin, Xianpei Han, Ben He, Yingfei Sun, Xiangang Li, Le Sun 0001. 28682-28693 [doi]
- Improving Dialogue State Tracking through Combinatorial Search for In-Context ExamplesHaesung Pyun, Yoonah Park, Yohan Jo. 28694-28714 [doi]
- Pretraining Context Compressor for Large Language Models with Embedding-Based MemoryYuhong Dai, Jianxun Lian, Yitian Huang, Wei Zhang, Mingyang Zhou, Mingqi Wu, Xing Xie, Hao Liao. 28715-28732 [doi]
- Dialogue Systems for Emotional Support via Value ReinforcementJuhee Kim, Chunghu Mok, Jisun Lee, Hyang-Sook Kim, Yohan Jo. 28733-28766 [doi]
- Length-Induced Embedding Collapse in PLM-based ModelsYuqi Zhou 0001, Sunhao Dai, Zhanshuo Cao, Xiao Zhang 0034, Jun Xu 0001. 28767-28791 [doi]
- SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster PredictionShester Gueuwou, Xiaodan Du 0001, Greg Shakhnarovich, Karen Livescu, Alexander H. Liu. 28792-28810 [doi]
- ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase GenerationLam Thanh Do, Aaditya Bodke, Pritom Saha Akash, Kevin Chen-Chuan Chang. 28811-28829 [doi]
- Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability ModelingSuvodip Dey, Yi-Jyun Sun, Gokhan Tur, Dilek Hakkani-Tür. 28830-28843 [doi]
- LLMs Trust Humans More, That's a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented GenerationYuxuan Li, Xinwei Guo, Jiashi Gao, Guanhua Chen 0001, Xiangyu Zhao 0001, Jiaxin Zhang, Quanying Liu, Haiyan Wu, Xin Yao 0001, Xuetao Wei. 28844-28858 [doi]
- Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool InvocationDongsheng Zhu, Weixian Shi, Zhengliang Shi, Zhaochun Ren, Shuaiqiang Wang, Lingyong Yan, Dawei Yin. 28859-28875 [doi]
- Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document RestorationYuyi Zhang, Peirong Zhang, Zhenhua Yang, Pengyu Yan, Yongxin Shi, Pengwei Liu, Fengjun Guo, Lianwen Jin. 28876-28892 [doi]
- PopAlign: Diversifying Contrasting Patterns for a More Comprehensive AlignmentZekun Moore Wang, Shenzhi Wang, King Zhu, Jiaheng Liu, Ke Xu 0001, Jie Fu 0001, Wangchunshu Zhou, Wenhao Huang. 28893-28921 [doi]
- Robust Utility-Preserving Text Anonymization Based on Large Language ModelsTianyu Yang 0004, Xiaodan Zhu, Iryna Gurevych. 28922-28941 [doi]
- SEAL: Scaling to Emphasize Attention for Long-Context RetrievalChanghun Lee, Minsang Seok, Jungyu Jin, Younghyun Cho, Eunhyeok Park. 28942-28955 [doi]
- From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons AlignmentChongxuan Huang, Yongshi Ye, Biao Fu, Qifeng Su, Xiaodong Shi. 28956-28974 [doi]
- \mathcalA³: Automatic Alignment Framework for Attributed Text GenerationYue Wang 0039, Haoke Zhang, Juntao Li, Jinxiong Chang, Min Zhang 0005. 28975-28990 [doi]
- Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and EnhancementBingbing Xu, Jing Yao, Xiaoyuan Yi, Aishan Maoliniyazi, Xing Xie 0001, Xiaofeng Meng 0001. 28991-29010 [doi]
- Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More MoreArvid Frydenlund. 29011-29059 [doi]
- Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk DecodingHidetaka Kamigaito, Hiroyuki Deguchi 0002, Yusuke Sakai 0010, Katsuhiko Hayashi 0001, Taro Watanabe. 29060-29094 [doi]
- Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language ModelsIdo Cohen, Daniela Gottesman, Mor Geva, Raja Giryes. 29095-29108 [doi]
- SDD: Self-Degraded Defense against Malicious Fine-tuningZixuan Chen, Weikai Lu, Xin Lin, Ziqian Zeng. 29109-29125 [doi]
- CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation ModelWei-Hsin Yeh, Yu-An Su, Chih-Ning Chen, Yi-Hsueh Lin, Calvin Ku, Wenhsin Chiu, Min-Chun Hu 0001, Lun-Wei Ku. 29126-29151 [doi]
- DRPruning: Efficient Large Language Model Pruning through Distributionally Robust OptimizationHexuan Deng, Wenxiang Jiao, Xuebo Liu, Jing Li, Min Zhang, Zhaopeng Tu. 29152-29173 [doi]
- How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMsKarin de Langis, Jong Inn Park, Andreas Schramm, Bin Hu, Khanh Chi Le, Dongyeop Kang. 29174-29191 [doi]
- Data Caricatures: On the Representation of African American Language in Pretraining CorporaNicholas Deas, Blake Vente, Amith Ananthram, Jessica Grieser, Desmond Upton Patton, Shana Kleiner, James R. Shepard III, Kathleen McKeown. 29192-29217 [doi]
- Language Model Probabilities are Not Calibrated in Numeric ContextsCharles Lovering, Michael Krumdick, Viet Dac Lai, Varshini Reddy, Seth Ebner, Nilesh Kumar, Rik Koncel-Kedziorski, Chris Tanner. 29218-29257 [doi]
- MDCure: A Scalable Pipeline for Multi-Document Instruction-FollowingGabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu, Idan Szpektor, Arman Cohan. 29258-29296 [doi]
- Cross-Lingual Auto Evaluation for Assessing Multilingual LLMsSumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, Mitesh M. Khapra. 29297-29329 [doi]
- DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking ProcessMinjun Zhu, Yixuan Weng, Linyi Yang, Yue Zhang 0004. 29330-29355 [doi]
- Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy GradientYuan Gao 0015, Zujing Liu, Weizhong Zhang, Bo Du 0001, Gui-Song Xia. 29356-29377 [doi]
- Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative AnalysisPriyanka Kargupta, Ishika Agarwal, Tal August, Jiawei Han 0001. 29378-29403 [doi]
- Hierarchical Memory Organization for Wikipedia GenerationEugene J. Yu, Dawei Zhu, Yifan Song 0002, Xiangyu Wong, Jiebin Zhang, Wenxuan Shi, Xiaoguang Li, Qun Liu 0001, Sujian Li. 29404-29427 [doi]
- Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding TasksChenlu Wang, Weimin Lyu, Ritwik Banerjee. 29428-29442 [doi]
- Structure-aware Domain Knowledge Injection for Large Language ModelsKai Liu 0023, Ze Chen 0001, Zhihang Fu, Wei Zhang 0090, Rongxin Jiang 0001, Fan Zhou 0007, Yaowu Chen, Yue Wu, Jieping Ye. 29443-29464 [doi]
- FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning EvaluationJunyu Luo 0002, Zhizhuo Kou, Liming Yang, Xiao Luo 0001, Jinsheng Huang, Zhiping Xiao 0001, Jingshu Peng, Chengzhong Liu, Jiaming Ji, Xuanzhe Liu, Sirui Han, Ming Zhang 0004, Yike Guo. 29465-29489 [doi]
- Dialectal Coverage And Generalization in Arabic Speech RecognitionAmirbek Djanibekov, Hawau Olamide Toyin, Raghad Alshalan, Abdullah Alatir, Hanan Aldarmaki. 29490-29502 [doi]
- EditInspector: A Benchmark for Evaluation of Text-Guided Image EditsRon Yosef, Yonatan Bitton, Dani Lischinski, Moran Yanuka. 29503-29530 [doi]
- Reconsidering LLM Uncertainty Estimation Methods in the WildYavuz Faruk Bakman, Duygu Nur Yaldiz, Sungmin Kang, Tuo Zhang, Baturalp Buyukates, Salman Avestimehr, Sai Praneeth Karimireddy. 29531-29556 [doi]
- Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference AlgorithmsCaio Corro, Mathieu Lacroix 0001, Joseph Le Roux. 29557-29574 [doi]
- SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt OptimizationWendi Cui, Jiaxin Zhang 0005, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Kumar Sricharan. 29575-29627 [doi]
- Programming by Example meets Historical Linguistics: A Large Language Model Based Approach to Sound Law InductionAtharva Naik, Darsh Agrawal, Hong Sng, Clayton Marr, Kexun Zhang, Nathaniel Romney Robinson, Kalvin Chang, Rebecca Byrnes, Aravind Mysore, Carolyn P. Rosé, David R. Mortensen. 29628-29647 [doi]
- Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News EventsPriyanka Kargupta, Yunyi Zhang 0001, Yizhu Jiao, Siru Ouyang, Jiawei Han 0001. 29648-29663 [doi]
- Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced ClaimsPriyanka Kargupta, Runchu Tian, Jiawei Han 0001. 29664-29679 [doi]
- The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM AgentsFeiran Jia, Tong Wu, Xin Qin, Anna Cinzia Squicciarini. 29680-29697 [doi]
- Sandcastles in the Storm: Revisiting the (Im)possibility of Strong WatermarkingFabrice Harel-Canada, Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng 0001, Amit Sahai. 29698-29735 [doi]
- Time-MQA: Time Series Multi-Task Question Answering with Context EnhancementYaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin 0005, Qingsong Wen. 29736-29753 [doi]
- From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMsRuxiao Chen, Chenguang Wang 0012, Yuran Sun, Xilei Zhao, Susu Xu. 29754-29778 [doi]
- GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent ReasoningShikhhar Siingh, Abhinav Rawat, Chitta Baral, Vivek Gupta. 29779-29800 [doi]
- Hanging in the Balance: Pivotal Moments in Crisis Counseling ConversationsVivian Nguyen, Lillian Lee, Cristian Danescu-Niculescu-Mizil. 29801-29817 [doi]
- Unveiling the Potential of BERT-family: A New Recipe for Building Scalable, General and Competitive Large Language ModelsYisheng Xiao, Juntao Li, Wenpeng Hu, Zhunchen Luo, Min Zhang. 29818-29833 [doi]
- TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research CorporaPriyanka Kargupta, Nan Zhang, Yunyi Zhang 0001, Rui Zhang 0037, Prasenjit Mitra, Jiawei Han 0001. 29834-29850 [doi]
- An Empirical Study of Iterative Refinements for Non-autoregressive TranslationYisheng Xiao, Pei Guo, Zechen Sun, Juntao Li, Kai Song, Min Zhang. 29851-29865 [doi]
- Retrofitting Large Language Models with Dynamic TokenizationDarius Feher, Ivan Vulic, Benjamin Minixhofer. 29866-29883 [doi]
- Principled Content Selection to Generate Diverse and Personalized Multi-Document SummariesVishakh Padmakumar, Zichao Wang 0001, David Arbour, Jennifer Healey. 29884-29899 [doi]
- Bilingual Zero-Shot Stance DetectionChenye Zhao, Cornelia Caragea. 29900-29919 [doi]
- GrammaMT: Improving Machine Translation with Grammar-Informed In-Context LearningRita Ramos, Everlyn Asiko Chimoto, Maartje ter Hoeve, Natalie Schluter. 29920-29940 [doi]
- Theorem Prover as a Judge for Synthetic Data GenerationJoshua Ong Jun Leang, Giwon Hong, Wenda Li, Shay B. Cohen. 29941-29977 [doi]
- Measuring the Effect of Transcription Noise on Downstream Language Understanding TasksOri Shapira, Shlomo E. Chazan, Amir David Nissan Cohen. 29978-30004 [doi]
- Assessing Reliability and Political Bias In LLMs' Judgements of Formal and Material Inferences With Partisan ConclusionsReto Gubelmann, Ghassen Karray. 30005-30031 [doi]
- PARME: Parallel Corpora for Low-Resourced Middle Eastern LanguagesSina Ahmadi, Rico Sennrich, Erfan Karami, Ako Marani, Parviz Fekrazad, Gholamreza Akbarzadeh Baghban, Hanah Hadi, Semko Heidari, Mahîr Dogan, Pedram Asadi, Dashne Bashir, Mohammad Amin Ghodrati, Kourosh Amini, Zeynab Ashourinezhad, Mana Baladi, Farshid Ezzati, Alireza Ghasemifar, Daryoush Hosseinpour, Behrooz Abbaszadeh, Amin Hassanpour, Bahaddin Jalal Hamaamin, Saya Kamal Hama, Ardeshir Mousavi, Sarko Nazir Hussein, Isar Nejadgholi, Mehmet Ölmez, Horam Osmanpour, Rashid Roshan Ramezani, Aryan Sediq Aziz, Ali Salehi, Mohammadreza Yadegari, Kewyar Yadegari, Sedighe Zamani Roodsari. 30032-30053 [doi]
- METAL: A Multi-Agent Framework for Chart Generation with Test-Time ScalingBingxuan Li, Yiwei Wang 0001, Jiuxiang Gu, Kai-Wei Chang, Nanyun Peng 0001. 30054-30069 [doi]
- ConLoan: A Contrastive Multilingual Dataset for Evaluating LoanwordsSina Ahmadi, Micha David Hess, Elena Álvarez Mellado, Alessia Battisti, Cui Ding, Anne Göhring, Yingqiang Gao, Zifan Jiang, Andrianos Michail, Peshmerge Morad, Joel Niklaus, Maria Christina Panagiotopoulou, Stefano Perrella, Juri Opitz, Anastassia Shaitarova, Rico Sennrich. 30070-30090 [doi]
- A Theory of Response Sampling in LLMs: Part Descriptive and Part PrescriptiveSarath Sivaprasad, Pramod Kaushik, Sahar Abdelnabi, Mario Fritz. 30091-30135 [doi]
- MEraser: An Effective Fingerprint Erasure Approach for Large Language ModelsJingxuan Zhang, Zhenhua Xu, Rui Hu, Wenpeng Xing, Xuhong Zhang 0001, Meng Han. 30136-30153 [doi]
- VISA: Retrieval Augmented Generation with Visual Source AttributionXueguang Ma, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Wenhu Chen, Jimmy Lin. 30154-30169 [doi]
- DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense RetrieversXueguang Ma, Xi Victoria Lin, Barlas Oguz, Jimmy Lin, Wen-tau Yih, Xilun Chen 0002. 30170-30186 [doi]
- Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMsZiling Cheng, Meng Cao 0003, Marc-Antoine Rondeau, Jackie CK Cheung. 30187-30214 [doi]
- MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement LearningChanwoo Park, Seungju Han 0002, Xingzhi Guo, Asuman E. Ozdaglar, Kaiqing Zhang, Joo-Kyung Kim. 30215-30248 [doi]
- Map&Make: Schema Guided Text to Table GenerationNaman Ahuja, Fenil Denish Bardoliya, Chitta Baral, Vivek Gupta. 30249-30262 [doi]
- IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document SequencesFengnan Li, Elliot D. Hill, Jiang Shu, Jiaxin Gao, Matthew M. Engelhard. 30263-30283 [doi]
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive ImagesShengguang Wu, Fan-Yun Sun, Kaiyue Wen, Nick Haber. 30284-30297 [doi]
- Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval MethodPeter Baile Chen, Yi Zhang 0001, Mike Cafarella, Dan Roth. 30298-30317 [doi]
- R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic MemoryTenghao Huang, Kinjal Basu 0002, Ibrahim Abdelaziz, Pavan Kapanipathi, Jonathan May, Muhao Chen 0001. 30318-30330 [doi]
- FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and StereotypesJanki Atul Nawale, Mohammed Safi Ur Rahman Khan, Janani D, Mansi Gupta, Danish Pruthi, Mitesh M. Khapra. 30331-30380 [doi]
- SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language ModelsZhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li 0010, Ke Hu, Zhehuai Chen, Shinji Watanabe 0001, Fei Cheng 0002, Chenhui Chu, Sadao Kurohashi. 30381-30398 [doi]
- Predicting Implicit Arguments in Procedural Video InstructionsAnil Batra, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller. 30399-30419 [doi]
- PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for FreeHao Li, Xiaogeng Liu, Ning Zhang, Chaowei Xiao. 30420-30437 [doi]
- CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIPTianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang 0001. 30438-30452 [doi]
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual GroundingAustin T. Wang, ZeMing Gong, Angel X. Chang. 30453-30475 [doi]
- The time scale of redundancy between prosody and linguistic contextTamar I. Regev, Chiebuka Ohams, Shaylee Xie, Lukas Wolf, Evelina Fedorenko, Alex Warstadt, Ethan Wilcox, Tiago Pimentel. 30476-30488 [doi]
- Basic Reading DistillationZhi Zhou, Sirui Miao, Xiangyu Duan, Hao Yang, Min Zhang. 30489-30502 [doi]
- Quantized Can Still Be Calibrated: A Unified Framework to Calibration in Quantized Large Language ModelsMingyu Zhong, Guanchu Wang, Yu-Neng Chuang, Na Zou 0001. 30503-30517 [doi]
- A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading BehaviorFrancesco Ignazio Re, Andreas Opedal, Glib Manaiev, Mario Giulianelli, Ryan Cotterell. 30518-30538 [doi]
- More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting ObjectivesXiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu 0005, Jian Luan 0001, Shuo Shang, Xiuying Chen, Rui Yan 0001. 30539-30552 [doi]
- Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language ModelsFei Wang, Xingchen Wan, Ruoxi Sun 0002, Jiefeng Chen 0001, Sercan Ö. Arik. 30553-30571 [doi]
- SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM EvaluationGayathri Saranathan, Cong Xu, Mahammad Parwez Alam, Tarun Kumar, Martin Foltin, Soon Yee Wong, Suparna Bhattacharya. 30572-30593 [doi]
- M³GQA: A Multi-Entity Multi-Hop Multi-Setting Graph Question Answering BenchmarkBoci Peng, Yongchao Liu, Xiaohe Bo, Jiaxin Guo, Yun Zhu, Xuanbo Fan, Chuntao Hong, Yan Zhang. 30594-30620 [doi]
- LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace FusionGuanghao Zhou, Panjia Qiu, Cen Chen 0001, HongYu Li, Jason Chu, Xin Zhang, Jun Zhou 0011. 30621-30638 [doi]
- ETF: An Entity Tracing Framework for Hallucination Detection in Code SummariesKishan Maharaj, Vitobha Munigala, Srikanth G. Tamilselvam, Prince Kumar, Sayandeep Sen, Palani Kodeswaran, Abhijit Mishra, Pushpak Bhattacharyya. 30639-30652 [doi]
- Meta-Tool: Unleash Open-World Function Calling Capabilities of General-Purpose Large Language ModelsShengqian Qin, Yakun Zhu, Linjie Mu, Shaoting Zhang 0001, Xiaofan Zhang 0002. 30653-30677 [doi]
- Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and ReasoningYingJie Zhu, Xuefeng Bai 0001, Kehai Chen, Yang Xiang 0003, Jun Yu, Min Zhang 0005. 30678-30701 [doi]
- ISR: Self-Refining Referring Expressions for Entity GroundingZhuocheng Yu, Bingchan Zhao, Yifan Song 0002, Sujian Li, Zhonghui He. 30702-30714 [doi]
- Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and InferenceSiyuan Wang, Dianyi Wang, Chengxing Zhou, Zejun Li, Zhihao Fan, Xuanjing Huang 0001, Zhongyu Wei. 30715-30727 [doi]
- CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language ModelsYongheng Zhang 0001, Xu Liu, Ruoxi Zhou, Qiguang Chen, Hao Fei 0003, Wenpeng Lu, Libo Qin 0001. 30728-30749 [doi]
- TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data ConsistencyHenry Peng Zou, Zhengyao Gu, Yue Zhou, Yankai Chen 0001, Weizhi Zhang 0001, Liancheng Fang, Yibo Wang 0001, Yangning Li, Kay Liu, Philip S. Yu. 30750-30762 [doi]
- The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource LanguagesJenalea Rajab, Anuoluwapo Aremu, Everlyn Asiko Chimoto, Dale Dunbar, Graham Morrissey, Fadel Thior, Luandrie Potgieter, Jessica Ojo, Atnafu Lambebo Tonja, Wilhelmina Ndapewa Onyothi Nekoto, Pelonomi Moiloa, Jade Z. Abbott, Vukosi Marivate, Benjamin Rosman. 30763-30776 [doi]
- Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional EncodingDaichi Hayakawa, Issei Sato. 30777-30834 [doi]
- Less is More: Explainable and Efficient ICD Code Prediction with Clinical EntitiesJames C. Douglas, Yidong Gan, Ben Hachey, Jonathan K. Kummerfeld. 30835-30847 [doi]
- Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code RepositoriesAlperen Yildiz, Sin G. Teo, Yiling Lou, Yebo Feng, Chong Wang 0013, Dinil Mon Divakaran. 30848-30865 [doi]
- Multi-Modality Expansion and Retention for LLMs through Parameter Merging and DecouplingJunlin Li, Guodong Du 0002, Jing Li 0034, Sim Kuan Goh, Wenya Wang, Yequan Wang, Fangming Liu, Ho-Kin Tang, Saleh Alharbi, Daojing He, Min Zhang 0005. 30866-30887 [doi]
- Serial Lifelong Editing via Mixture of Knowledge ExpertsYuJu Cheng, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang. 30888-30903 [doi]
- A Survey on Efficient Large Language Model Training: From Data-centric PerspectivesJunyu Luo 0002, Bohan Wu, Xiao Luo 0001, Zhiping Xiao 0001, Yiqiao Jin, Rong-Cheng Tu, Nan Yin, Yifan Wang 0014, Jingyang Yuan, Wei Ju, Ming Zhang 0004. 30904-30920 [doi]
- IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video DetectionZhi Zeng, Jiaying Wu, Minnan Luo, Herun Wan, Xiangzheng Kong, Zihan Ma 0001, Guang Dai, Qinghua Zheng. 30921-30933 [doi]
- DDxTutor: Clinical Reasoning Tutoring System with Differential Diagnosis-Based Structured ReasoningQian Wu, Zheyao Gao, Longfei Gou, Qi Dou 0001. 30934-30957 [doi]
- SocialEval: Evaluating Social Intelligence of Large Language ModelsJinfeng Zhou, Yuxuan Chen, Yihan Shi, Xuanming Zhang, Leqi Lei, Yi Feng, Zexuan Xiong, Miao Yan, Xunzhi Wang, Yaru Cao, Jianing Yin, Shuai Wang, Quanyu Dai, Zhenhua Dong, Hongning Wang, Minlie Huang. 30958-31012 [doi]
- Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal SettingsMd Messal Monem Miah, Adrita Anika, Xi Shi, Ruihong Huang. 31013-31034 [doi]
- Analyzing and Mitigating Inconsistency in Discrete Speech Tokens for Neural Codec Language ModelsWenrui Liu 0003, Zhifang Guo, Jin Xu, Yuanjun Lv, Yunfei Chu, Zemin Liu, Junyang Lin. 31035-31046 [doi]
- PlanningArena: A Modular Benchmark for Multidimensional Evaluation of Planning and Tool LearningZihan Zheng, Tianle Cui, Chuwen Xie, Jiahui Pan, Qianglong Chen, Lewei He. 31047-31086 [doi]
- FocusLLM: Precise Understanding of Long Context by Dynamic CondensingZhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan 0001, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang 0001. 31087-31101 [doi]
- Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text EmbeddingsTengyu Pan, Zhichao Duan 0001, Zhenyu Li, Bowen Dong, Ning Liu 0014, Xiuxing Li, Jianyong Wang 0001. 31102-31118 [doi]
- GPT-4 as a Homework Tutor Can Improve Student Engagement and Learning OutcomesAlessandro Vanzo, Sankalan Pal Chowdhury, Mrinmaya Sachan. 31119-31136 [doi]
- Diffusion Models Through a Global Lens: Are They Culturally Inclusive?Zahra Bayramli, Ayhan Suleymanzade, Na Min An, Huzama Ahmad, Eunsu Kim, Junyeong Park, James Thorne, Alice Oh. 31137-31155 [doi]
- Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward ModelingQiyuan Deng, Xuefeng Bai 0001, Kehai Chen, Yaowei Wang 0001, Liqiang Nie, Min Zhang 0005. 31156-31171 [doi]
- English-based acoustic models perform well in the forced alignment of two English-based Pacific CreolesSam Passmore, Lila San Roque, Kirsty Gillespie, Saurabh Nath, Kira Davey, Keira Mullan, Tim Cawley, Jennifer Biggs, Rosey Billington, Bethwyn Evans, Nick Thieberger, Danielle Barth. 31172-31183 [doi]
- Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editingKaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang 0002, Lifeng Shang, Qun Liu 0001, Wenjie Li 0002. 31184-31203 [doi]
- Truth Knows No Language: Evaluating Truthfulness Beyond EnglishBlanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes 0001, Pablo Gamallo 0001, Iria de-Dios-Flores, Rodrigo Agerri. 31204-31218 [doi]
- Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following AbilityYusuke Sakai 0010, Hidetaka Kamigaito, Taro Watanabe. 31219-31238 [doi]
- Batayan: A Filipino NLP benchmark for evaluating Large Language ModelsJann Railey Montalan, Jimson Paulo Layacan, David Demitri Africa, Richell Isaiah Flores, Michael Tuscano Lopez II, Theresa Denise Magsajo, Anjanette Cayabyab, William-Chandra Tjhi. 31239-31273 [doi]
- HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic ClaimsMichiel van der Meer, Pavel Korshunov, Sébastien Marcel, Lonneke van der Plas. 31274-31291 [doi]
- CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryWeichen Zhang, Chen Gao, Shiquan Yu, Ruiying Peng, Baining Zhao, Qian Zhang, Jinqiang Cui, Xinlei Chen, Yong Li. 31292-31309 [doi]
- It's Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text SystemsIuliia Zaitova, Badr M. Abdullah, Wei Xue, Dietrich Klakow, Bernd Möbius, Tania Avgustinova. 31310-31322 [doi]
- PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News ArticlesNikolaos Nikolaidis 0004, Nicolas Stefanovitch, Purificação Silvano, Dimitar Iliyanov Dimitrov, Roman Yangarber, Nuno Guimarães, Elisa Sartori, Ion Androutsopoulos, Preslav Nakov, Giovanni Da San Martino, Jakub Piskorski. 31323-31345 [doi]
- A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language ModelsYongbin Guo, Shuzhen Li, Zhulin Liu, Tong Zhang 0015, C. L. Philip Chen. 31346-31359 [doi]
- Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based GamesSeungwon Lim, Seungbeen Lee, Dongjun Min, Youngjae Yu. 31360-31394 [doi]
- SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed ScienceJie Ying, Zihong Chen, Zhefan Wang, Wanli Jiang, Chenyang Wang, Zhonghang Yuan, Haoyang Su, Huanjun Kong, Fan Yang, Nanqing Dong. 31395-31449 [doi]
- -Stance: A Large-Scale Real World Dataset of Stances in Legal ArgumentationAnkita Gupta, Douglas Rice, Brendan T. O'Connor 0001. 31450-31467 [doi]
- Re³Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-trainingZhiyang Zhang, Ziqiang Liu, Huiming Wang, Renke Shan, Li Kuang, Lu Wang, De Wen Soh. 31468-31480 [doi]
- Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic InteractionsJihyoung Jang, Minwook Bae, Minji Kim, Dilek Hakkani-Tür, Hyounghun Kim. 31481-31512 [doi]
- Multimodal Coreference Resolution for Chinese Social Media Dialogues: Dataset and Benchmark ApproachXingyu Li, Chen Gong 0004, Guohong Fu. 31513-31525 [doi]
- TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value IdentificationYindu Su, Huike Zou, Lin Sun, Ting Zhang 0001, Haiyang Yang, Chen Li Yu, David Lo 0001, Qingheng Zhang, Shuguang Han, Jufeng Chen. 31526-31538 [doi]
- Theory of Mind in Large Language Models: Assessment and EnhancementRuirui Chen 0002, Weifeng Jiang, Chengwei Qin, Cheston Tan. 31539-31558 [doi]
- Completing A Systematic Review in Hours instead of Months with Interactive AI AgentsRui Qiu, Shijie Chen, Yu Su 0011, Po-Yin Yen, Han-Wei Shen. 31559-31593 [doi]
- CMHKF: Cross-Modality Heterogeneous Knowledge Fusion for Weakly Supervised Video Anomaly DetectionGuohua Wang, Shengping Song, Wuchun He, Yongsen Zheng. 31594-31607 [doi]
- CLaSp: In-Context Layer Skip for Self-Speculative DecodingLongze Chen, Renke Shan, Huiming Wang, Lu Wang, Ziqiang Liu, Run Luo, Jiawei Wang, Hamid Alinejad-Rokny, Min Yang 0007. 31608-31618 [doi]
- Teaching Text Agents to Learn Sequential Decision Making from FailureCanasai Kruengkrai, Koichiro Yoshino. 31619-31635 [doi]
- The Harmonic Structure of Information ContoursEleftheria Tsipidi, Samuel Kiegeland, Franz Nowak, Tianyang Xu, Ethan Wilcox, Alex Warstadt, Ryan Cotterell, Mario Giulianelli. 31636-31659 [doi]
- REAL-MM-RAG: A Real-World Multi-Modal Retrieval BenchmarkNavve Wasserman, Roi Pony, Oshri Naparstek, Adi Raz Goldfarb, Eli Schwartz, Udi Barzelay, Leonid Karlinsky. 31660-31683 [doi]
- Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language ModelsMats Faulborn, Indira Sen, Max Pellert, Andreas Spitz, David García 0001. 31684-31704 [doi]
- LongSafety: Evaluating Long-Context Safety of Large Language ModelsYida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang 0001, Hongning Wang, Minlie Huang. 31705-31725 [doi]
- Exploiting Contextual Knowledge in LLMs through V-usable Information based Layer EnhancementXiaowei Yuan, Zhao Yang 0004, Ziyang Huang, Yequan Wang, Siqi Fan 0001, Yiming Ju, Jun Zhao 0001, Kang Liu 0001. 31726-31741 [doi]
- Unintended Harms of Value-Aligned LLMs: Psychological and Empirical InsightsSooyung Choi, Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Xing Xie 0001, JinYeong Bak. 31742-31768 [doi]
- Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal RetrievalHani AlOmari, Anushka Sivakumar, Andrew Zhang, Chris Thomas 0004. 31769-31785 [doi]
- The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past ResearchHong Chen, Misha Teplitskiy, David Jurgens. 31786-31802 [doi]
- MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable RecommendationChing-Wen Yang, Zhi-Quan Feng, Ying-Jia Lin, Che-Wei Chen, Kun-da Wu, Hao Xu, Jui-Feng Yao, Hung-Yu Kao. 31803-31821 [doi]
- Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in TransformersClément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West 0001. 31822-31841 [doi]
- Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated SurveyIvan Vegner, Sydelle de Souza, Valentin Forch, Martha Lewis, Leonidas A. A. Doumas. 31842-31856 [doi]
- Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language ModelsBoheng Sheng, Jiacheng Yao, Meicong Zhang, Guoxiu He. 31857-31876 [doi]
- DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question AnsweringRong Cheng, Jinyi Liu 0002, Yan Zheng 0002, Fei Ni 0001, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang 0027, Jianye Hao. 31877-31899 [doi]
- Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World ModelSiheng Xiong, Ali Payani, Yuan Yang, Faramarz Fekri. 31900-31931 [doi]
- Refining Salience-Aware Sparse Fine-Tuning Strategies for Language ModelsXinxin Liu, Aaron Thomas, Cheng Zhang, Jianyi Cheng, Yiren Zhao, Xitong Gao. 31932-31945 [doi]
- Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse AttentionEmily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch. 31946-31958 [doi]
- ScaleBiO: Scalable Bilevel Optimization for LLM Data ReweightingRui Pan 0002, Dylan Zhang, Hanning Zhang, Xingyuan Pan, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang 0008, Tong Zhang 0001. 31959-31982 [doi]
- PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human PreferenceJiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen 0008, Josef Dai, Boren Zheng, Tianyi Alex Qiu, Jiayi Zhou, Kaile Wang, Boxun Li, Sirui Han, Yike Guo, Yaodong Yang 0001. 31983-32016 [doi]
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient PerspectiveMing Li, Yanhong Li, Tianyi Zhou. 32017-32154 [doi]
- Beyond Text Compression: Evaluating Tokenizers Across ScalesJonas F. Lotz, António Vilarinho Lopes, Stephan Peitz, Hendra Setiawan, Leonardo Emili. 32155-32173 [doi]
- Emergent Abilities of Large Language Models under Continued Pre-training for Language AdaptationAhmed Elhady, Eneko Agirre, Mikel Artetxe. 32174-32186 [doi]
- R-Fairness: Assessing Fairness of Ranking in Subjective DataLorenzo Balzotti, Donatella Firmani, Jerin George Mathew, Riccardo Torlone, Sihem Amer-Yahia. 32187-32199 [doi]
- RePanda: Pandas-powered Tabular Verification and ReasoningAtoosa Malemir Chegini, Keivan Rezaei, Hamid Eghbalzadeh, Soheil Feizi. 32200-32212 [doi]
- Towards Style Alignment in Cross-Cultural TranslationShreya Havaldar, Adam Stein, Eric Wong 0001, Lyle H. Ungar. 32213-32230 [doi]
- TiC-LM: A Web-Scale Benchmark for Time-Continual LLM PretrainingJeffrey Li, Mohammadreza Armandpour, Seyed-Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri. 32231-32273 [doi]
- Entailed Between the Lines: Incorporating Implication into NLIShreya Havaldar, Hamidreza Alvari, John Palowitch, Mohammad Javad Hosseini, Senaka Buthpitiya, Alex Fabrikant. 32274-32290 [doi]
- Multi-Level Explanations for Generative Language ModelsLucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh. 32291-32317 [doi]
- A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering SystemsDorde Klisura, Astrid R. Bernaga Torres, Anna Karen Gárate-Escamilla, Rajesh Roshan Biswal, Ke Yang, Hilal Pataci, Anthony Rios. 32318-32337 [doi]
- Low-Bit Quantization Favors Undertrained LLMsXu Ouyang, Tao Ge 0005, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu 0001. 32338-32348 [doi]
- LETS-C: Leveraging Text Embedding for Time Series ClassificationRachneet Kaur, Zhen Zeng, Tucker Balch, Manuela Veloso. 32365-32399 [doi]
- UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban SpacesBaining Zhao, Jianjie Fang, Zichao Dai, Ziyou Wang, Jirong Zha, Weichen Zhang, Chen Gao, Yue Wang, Jinqiang Cui, Xinlei Chen, Yong Li. 32400-32423 [doi]
- HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text RetrievalSungho Park, Joohyung Yun, Jongwuk Lee, Wook-Shin Han. 32424-32444 [doi]
- ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended CapabilitiesAdhiraj Ghosh, Sebastian Dziadzio, Ameya Prabhu, Vishaal Udandarao, Samuel Albanie, Matthias Bethge. 32445-32481 [doi]
- La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin AmericaMaría Grandury, Javier Aula-Blasco, Júlia Falcão, Clémentine Fourrier, Miguel González Saiz, Gonzalo Martínez 0001, Gonzalo Santamaría Gómez, Rodrigo Agerri, Nuria Aldama-García, Luis Chiruzzo, Javier Conde, Helena Gómez-Adorno, Marta Guerrero Nieto, Guido Ivetta, Natàlia López Fuertes, Flor Miriam Plaza del Arco, María-Teresa Martín Valdivia, Helena Montoro Zamorano, Carmen Muñoz Sanz, Pedro Reviriego, Leire Rosado Plaza, Alejandro Vaca Serrano, María Estrella Vallecillo Rodríguez, Jorge Vallego, Irune Zubiaga. 32482-32524 [doi]
- Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMsXiang Zhang, Juntai Cao, Chenyu You, Dujian Ding. 32525-32555 [doi]
- Energy Considerations of Large Language Model Inference and Efficiency OptimizationsJared Fernandez, Clara Na, Vashisth Tiwari, Yonatan Bisk, Sasha Luccioni, Emma Strubell. 32556-32569 [doi]
- Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert ModelsLior Belenki, Alekh Agarwal, Tianze Shi, Kristina Toutanova. 32570-32587 [doi]
- BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem ProvingRan Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, Ming Ding. 32588-32599 [doi]
- Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph TranslationFan Yin, Zifeng Wang 0002, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister. 32600-32616 [doi]
- Logic-Regularized Verifier Elicits Reasoning from LLMsXinyu Wang 0022, Changzhi Sun, Lian Cheng, Yuanbin Wu, Dell Zhang, Xiaoling Wang, Xuelong Li 0001. 32617-32630 [doi]
- Squeezed Attention: Accelerating Long Context Length LLM InferenceColeman Richard Charles Hooper, Sehoon Kim, Hiva Mohammadzadeh, Monishwaran Maheswaran, Sebastian Zhao, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami. 32631-32652 [doi]
- LangMark: A Multilingual Dataset for Automatic Post-EditingDiego Velazquez, Mikaela Grace, Konstantinos Karageorgos, Lawrence Carin, Aaron Schliem, Dimitrios Zaikis, Roger Wechsler. 32653-32667 [doi]
- Neural Parameter Search for Slimmer Fine-Tuned Models and Better TransferGuodong Du 0002, Zitao Fang, Jing Li 0034, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai Liu 0001, Min Zhang 0005. 32668-32687 [doi]
- Merge Hijacking: Backdoor Attacks to Model Merging of Large Language ModelsZenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou 0001, Lichao Sun 0001. 32688-32703 [doi]
- Where Are We? Evaluating LLM Performance on African LanguagesIfe Adebara, Hawau Olamide Toyin, Nahom Tesfu Ghebremichael, AbdelRahim A. Elmadany, Muhammad Abdul-Mageed. 32704-32731 [doi]
- Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context LearningChengwei Qin, Wenhan Xia, Fangkai Jiao, Chen Chen 0075, Yuchen Hu, Bosheng Ding, Ruirui Chen 0002, Shafiq Joty. 32732-32758 [doi]
- CiteEval: Principle-Driven Citation Evaluation for Source AttributionYumo Xu, Peng Qi 0003, Jifan Chen, Kunlun Liu, Rujun Han, Lan Liu 0004, Bonan Min, Vittorio Castelli, Arshit Gupta, Zhiguo Wang. 32759-32778 [doi]
- HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language ModelMengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu 0001, Wenqi Shao, Ping Luo 0002. 32779-32798 [doi]
- EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue FrameworkYao Shi, Rongkeng Liang, Yong Xu. 32799-32828 [doi]
- KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive ReasoningPeiqi Sui, Juan Diego Rodriguez, Philippe Laban, Dean Murphy, Joseph P. Dexter, Richard Jean So, Samuel Baker, Pramit Chaudhuri. 32829-32849 [doi]
- Efficient Domain Continual pretraining by Mitigating the Stability GapYiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao 0001. 32850-32870 [doi]
- Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMsFakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, AbdelRahim A. Elmadany, Omer Nacar, El Moatez Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, Rahaf Alhamouri, Hamzah A. Alsayadi, Hiba Zayed, Sara Shatnawi, Serry Sibaee, Yasir Ech-Chammakhy, Walid Al-Dhabyani, Marwa Mohamed Ali, Imen Jarraya, Ahmed Oumar El-Shangiti, Aisha Alraeesi, Mohammed Anwar Al-Ghrawi, Abdulrahman S. Al-Batati, Elgizouli Mohamed, Noha Taha Elgindi, Muhammed Saeed, Houdaifa Atou, Issam Ait Yahia, Abdelhak Bouayad, Mohammed Machrouh, Amal Makouar, Dania Alkawi, Mukhtar Mohamed, Safaa Taher Abdelfadil, Amine Ziad Ounnoughene, Rouabhia Anfel, Rwaa Assi, Ahmed Sorkatti, Mohamedou Cheikh Tourad, Anis Koubaa, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed. 32871-32894 [doi]
- NewsInterview: a Dataset and a Playground to Evaluate LLMs' Grounding Gap via Informational InterviewsAlexander Spangher, Michael Lu, Sriya Kalyan, Hyundong Justin Cho, Tenghao Huang, Weiyan Shi, Jonathan May. 32895-32925 [doi]
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMsTao Zhang, Chenglin Zhu, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Fan Yang 0024, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui 0001, Wentao Zhang 0001, Zenan Zhou. 32926-32944 [doi]
- Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian LanguagesAshwin Sankar, Sparsh Jain, Nikhil Narasimhan, Devilal Choudhary, Dhairya Suman, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M. Khapra, Raj Dabre. 32945-32966 [doi]
- CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAGYang Tian, Fan Liu 0008, Jingyuan Zhang, Victoria W., Yupeng Hu, Liqiang Nie. 32967-32982 [doi]
- Mapping 1, 000+ Language Models via the Log-Likelihood VectorMomose Oyama, Hiroaki Yamagiwa, Yusuke Takase, Hidetoshi Shimodaira. 32983-33038 [doi]
- ConsistencyChecker: Tree-based Evaluation of LLM Generalization CapabilitiesZhaochen Hong, Haofei Yu, Jiaxuan You. 33039-33075 [doi]
- Robust Estimation of Population-Level Effects in Repeated-Measures NLP Experimental DesignsAlejandro Benito-Santos, Adrián Ghajari, Víctor Fresno. 33076-33089 [doi]
- FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality EvaluationFarima Fatahi Bayat, LeChen Zhang, Sheza Munir, Lu Wang. 33090-33110 [doi]
- Training-free LLM Merging for Multi-task LearningZichuan Fu, Xian Wu 0001, Yejing Wang, Wanyu Wang, Shanshan Ye, Hongzhi Yin, Yi Chang 0001, Yefeng Zheng 0001, Xiangyu Zhao 0001. 33111-33124 [doi]
- Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate SelectionMingyu Derek Ma, Yanna Ding, Zijie Huang 0002, Jianxi Gao, Yizhou Sun, Wei Wang 0010. 33125-33144 [doi]
- Comparison-based Active Preference Learning for Multi-dimensional PersonalizationMinhyeon Oh, Seungjoon Lee, Jungseul Ok. 33145-33166 [doi]
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language ModelsSiming Huang, Tianhao Cheng, Jason Klein Liu, Weidi Xu, Jiaran Hao, Liuyihan Song, Yang Xu, Jian Yang 0030, Jiaheng Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Xianzhen Luo, Qiufeng Wang, Yuantao Fan, Qingfu Zhu, Zhaoxiang Zhang 0001, Yang Gao 0021, Jie Fu 0001, Qian Liu, Houyi Li, Ge Zhang 0009, Yuan Qi 0001, Yinghui Xu, Wei Chu, Zili Wang. 33167-33193 [doi]
- LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsChansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang. 33194-33215 [doi]
- AmbiK: Dataset of Ambiguous Tasks in Kitchen EnvironmentAnastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr Panov. 33216-33241 [doi]
- SocialCC: Interactive Evaluation for Cultural Competence in Language AgentsJincenzi Wu, Jianxun Lian, Dingdong Wang, Helen M. Meng. 33242-33271 [doi]
- Scalable Vision Language Model Training via High Quality Data CurationHongyuan Dong, Zijian Kang, Weijie Yin, LiangXiao LiangXiao, ChaoFeng ChaoFeng, Ran Jiao. 33272-33293 [doi]
- GRAM: Generative Recommendation via Semantic-aware Multi-granular Late FusionSunkyung Lee 0001, Minjin Choi 0001, Eunseong Choi, Hye-Young Kim, Jongwuk Lee. 33294-33312 [doi]
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsTao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Shenlixing Shenlixing, Chenzhan Chenzhan, Xipeng Qiu, Qi Zhang 0001, Tao Gui. 33313-33328 [doi]
- TETRIS: Optimal Draft Token Selection for Batch Speculative DecodingZhaoxuan Wu, Zijian Zhou, Arun Verma, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low. 33329-33345 [doi]
- Introducing Verification Task of Set Consistency with Set-Consistency Energy NetworksMooho Song, Hye Ryung Son, Jay Yoon Lee. 33346-33366 [doi]
- Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in LegislationAtharvan Dogra, Krishna Pillutla, Ameet Deshpande, Ananya B. Sai, John J. Nay, Tanmay Rajpurohit, Ashwin Kalyan, Balaraman Ravindran. 33367-33390 [doi]
- AfroCS-xs: Creating a Compact, High-Quality, Human-Validated Code-Switched Dataset for African LanguagesKayode Olaleye, Arturo Oncevay, Mathieu Sibue, Nombuyiselo Zondi, Michelle Terblanche, Sibongile Mapikitla, Richard Lastrucci, Charese Smiley, Vukosi Marivate. 33391-33410 [doi]
- Just Go Parallel: Improving the Multilingual Capabilities of Large Language ModelsMuhammad Reza Qorib, Junyi Li, Hwee Tou Ng. 33411-33424 [doi]
- Design Choices for Extending the Context Length of Visual Language ModelsMukai Li, Lei Li 0039, Shansan Gong, Qi Liu 0049. 33425-33438 [doi]