Abstract is missing.
- Understanding Figurative Meaning through Explainable Visual EntailmentArkadiy Saakyan, Shreyas Kulkarni, Tuhin Chakrabarty, Smaranda Muresan. 1-23 [doi]
- Benchmarking Distributional Alignment of Large Language ModelsNicole Meister, Carlos Guestrin, Tatsunori Hashimoto. 24-49 [doi]
- World Models with Hints of Large Language Models for Goal AchievingZeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li 0001, Furong Huang, Huazhe Xu. 50-72 [doi]
- CogLM: Tracking Cognitive Development of Large Language ModelsXinglin Wang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li 0001, Boyuan Pan, Heda Wang, Yao Hu, Kan Li 0001. 73-87 [doi]
- Improving and Assessing the Fidelity of Large Language Models Alignment to Online CommunitiesMinh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman. 88-111 [doi]
- Improving Retrospective Language Agents via Joint Policy Gradient OptimizationXueyang Feng, Bo Lan, Quanyu Dai, Lei Wang 0198, Jiakai Tang, Xu Chen 0017, Zhenhua Dong, Ji-Rong Wen. 112-141 [doi]
- CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesXiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang 0037, Michael Qizhe Shieh, Wenmeng Zhou. 142-160 [doi]
- Instantly Learning Preference Alignment via In-context DPOFeifan Song 0001, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang. 161-178 [doi]
- ALTER: Augmentation for Large-Table-Based ReasoningHan Zhang, Yuheng Ma 0001, Hanfang Yang. 179-198 [doi]
- What the #?*!: Disentangling Hate Across Target IdentitiesYiping Jin, Leo Wanner, Aneesh Moideen Koya. 199-221 [doi]
- MAD Speech: Measures of Acoustic Diversity of SpeechMatthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov. 222-235 [doi]
- The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model designArtem Snegirev, Maria Tikhonova, Anna Maksimova, Alena Fenogenova, Aleksandr Abramov. 236-254 [doi]
- PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable QueriesMingwen Dong, Nischal Ashok Kumar, Yiqun Hu, Anuj Chauhan, Chung-Wei Hang, Shuaichen Chang, Lin Pan 0003, Wuwei Lan, Henghui Zhu, Jiarong Jiang, Patrick Ng, Zhiguo Wang. 255-273 [doi]
- MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation SystemsNandan Thakur, Suleman Kazi, Ge Luo 0002, Jimmy Lin, Amin Ahmad. 274-298 [doi]
- LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsDo Xuan Long, Ngoc Hai Nguyen, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan. 299-330 [doi]
- The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize RadicalsXiaofeng Wu, Karl Stratos, Wei Xu. 331-350 [doi]
- PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example BanksSoumya Suvra Ghosal, Soumyabrata Pal, Koyel Mukherjee, Dinesh Manocha. 351-365 [doi]
- Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive PromptsTingchen Fu, Yupeng Hou, Julian J. McAuley, Rui Yan 0001. 366-384 [doi]
- Fingerspelling within Sign Language TranslationGarrett Tanzer. 385-464 [doi]
- MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document CollectionsNishant Balepur, Alexa F. Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun 0005, Jordan Lee Boyd-Graber, Puneet Mathur. 465-491 [doi]
- Aligning Sentence Simplification with ESL Learner's Proficiency for Language AcquisitionGuanlin Li, Yuki Arase, Noël Crespi. 492-507 [doi]
- PeerQA: A Scientific Question Answering Dataset from Peer ReviewsTim Baumgärtner, Ted Briscoe, Iryna Gurevych. 508-544 [doi]
- ALiiCE: Evaluating Positional Fine-grained Citation GenerationYilong Xu, Jinhua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng. 545-561 [doi]
- An LLM-Based Approach for Insight Generation in Data AnalysisAlberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood. 562-582 [doi]
- WebQuality: A Large-scale Multi-modal Web Page Quality Assessment Dataset with Multiple Scoring DimensionsTao Zhang, Yige Wang, ZhuHangyu ZhuHangyu, Li Xin, Chen Xiang, Tian Hua Zhou, Jin Ma. 583-596 [doi]
- UFO: A UI-Focused Agent for Windows OS InteractionChaoyun Zhang, Liqun Li, Shilin He, Xu Zhang 0024, Bo Qiao 0001, Si-qin, Minghua Ma, Yu Kang 0006, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang 0001, Qi Zhang 0066. 597-622 [doi]
- Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded AdversarialnessYoo yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, Jordan Lee Boyd-Graber. 623-642 [doi]
- Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report GenerationLiwen Sun, James (Jialun) Zhao, Wenjing Han, Chenyan Xiong. 643-655 [doi]
- On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMsNitay Calderon, Roi Reichart. 656-693 [doi]
- Direct Preference Optimization of Video Large Multimodal Models from Language Model RewardRuohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander G. Hauptmann, Yonatan Bisk, Yiming Yang. 694-717 [doi]
- FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight SharingJames Seale Smith, Chi-Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, Yen-Chang Hsu. 718-730 [doi]
- Conformalized Answer Set Prediction for Knowledge Graph EmbeddingYuqicheng Zhu, Nico Potyka, Jiarong Pan, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab. 731-750 [doi]
- Parameter-free and Accessible Prompt Learning to Enhance Adversarial Robustness for Pre-trained Vision-Language ModelsXingran Zhou, Kun Yang, Changtao Miao, Bingyu Hu, Zhuoer Xu, Shiwen Cui, Changhua Meng, Dan Hong. 751-761 [doi]
- Fine-grained Fallacy Detection with Human Label VariationAlan Ramponi, Agnese Daffara, Sara Tonelli. 762-784 [doi]
- Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language ModelsHila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith. 785-798 [doi]
- SELFGOAL: Your Language Agents Already Know How to Achieve High-level GoalsRuihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson 0001, Yanghua Xiao, Deqing Yang. 799-819 [doi]
- Familarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training DataJonas Golde, Patrick Haller 0002, Max Ploner, Fabio Barth, Nicolaas Paul Jedema, Alan Akbik. 820-834 [doi]
- Learning to Summarize from LLM-generated FeedbackHwanjun Song, Taewon Yun, Yuho Lee, Jihwan Oh, Gihun Lee, Jason Cai, Hang Su. 835-857 [doi]
- Hybrid Graphs for Table-and-Text based Question Answering using LLMsAnkush Agarwal, Chaitanya Devaguptapu, Ganesh S. 858-875 [doi]
- CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language ModelsYing Nie, Binwei Yan, Tianyu Guo 0001, Hao Liu, Haoyu Wang 0003, Wei He 0001, Binfan Zheng, Weihao Wang, Qiang Li 0024, Weijian Sun, Yunhe Wang 0001, Dacheng Tao. 876-891 [doi]
- LLM-Based Explicit Models of Opponents for Multi-Agent GamesXiaopeng Yu 0001, Wanpeng Zhang 0002, Zongqing Lu 0002. 892-911 [doi]
- SeqAR: Jailbreak LLMs with Sequential Auto-Generated CharactersYan Yang, Zeguan Xiao, Xin Lu, Hongru Wang 0003, Xuetao Wei, Hailiang Huang, Guanhua Chen 0001, Yun Chen 0007. 912-931 [doi]
- JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware EvaluationShota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa. 932-950 [doi]
- EASYTOOL: Enhancing LLM-based Agents with Concise Tool InstructionSiyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan 0003, Yongliang Shen 0001, Kan Ren, Dongsheng Li 0002, Deqing Yang. 951-972 [doi]
- Decoding Hate: Exploring Language Models' Reactions to Hate SpeechPaloma Piot, Javier Parapar. 973-990 [doi]
- Babysit A Language Model From Scratch: Interactive Language Learning by Trials and DemonstrationsZiqiao Ma 0001, Zekun Wang 0002, Joyce Chai. 991-1010 [doi]
- MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine TranslationLanglin Huang, Mengyu Bu, Yang Feng 0004. 1011-1028 [doi]
- LLM-Human Pipeline for Cultural Grounding of ConversationsRajkumar Pujari, Dan Goldwasser. 1029-1048 [doi]
- ACCESS : A Benchmark for Abstract Causal Event Discovery and ReasoningVy Vo, Lizhen Qu, Tao Feng 0013, Yuncheng Hua, Xiaoxi Kang, Songhai Fan, Tim Dwyer, Lay-Ki Soon, Gholamreza Haffari. 1049-1074 [doi]
- Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social ScenariosBryan Chen Zhengyu Tan, Roy Ka-Wei Lee. 1075-1108 [doi]
- GloCOM: A Short Text Neural Topic Model via Global Clustering ContextQuang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van 0001, Sang Dinh, Thien Huu Nguyen. 1109-1124 [doi]
- Reversed Attention: On The Gradient Descent Of Attention Layers In GPTShahar Katz, Lior Wolf. 1125-1152 [doi]
- Self-Harmonized Chain of ThoughtZiqi Jin, Wei Lu 0011. 1153-1174 [doi]
- AnaScore: Understanding Semantic Parallelism in Proportional AnalogiesLiyan Wang, Haotong Wang, Yves Lepage. 1175-1188 [doi]
- Generating Complex Question Decompositions in the Face of Distribution ShiftsKelvin Han, Claire Gardent. 1189-1211 [doi]
- Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question AnsweringYeonjun In, SungChul Kim, Ryan A. Rossi, Md. Mehrab Tanjim, Tong Yu 0001, Ritwik Sinha, Chanyoung Park 0001. 1212-1233 [doi]
- Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI TutorsKaushal Kumar Maurya, Kv Aditya Srivatsa, Kseniia Petukhova, Ekaterina Kochmar. 1234-1251 [doi]
- Where is the answer? An empirical study of positional bias for parametric knowledge extraction in language modelKuniaki Saito, Chen-Yu Lee, Kihyuk Sohn, Yoshitaka Ushiku. 1252-1269 [doi]
- Evaluating Morphological Compositional Generalization in Large Language ModelsMete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Duygu Ataman, Lonneke van der Plas. 1270-1305 [doi]
- Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMsBichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin 0001. 1306-1321 [doi]
- AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location PredictionJie Feng 0002, Yuwei Du, Jie Zhao, Yong Li 0008. 1322-1338 [doi]
- Embedding derived animacy rankings offer insights into the sources of grammatical animacyVivian G. Li. 1339-1351 [doi]
- Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-EnhancementQianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang 0004, Daiyuan Li, Yu Hu, Mingkui Tan. 1352-1391 [doi]
- Little Giants: Synthesizing High-Quality Embedding Data at ScaleHaonan Chen 0005, Liang Wang 0046, Nan Yang 0002, Yutao Zhu 0001, Ziliang Zhao, Furu Wei, Zhicheng Dou. 1392-1411 [doi]
- Can LLMs Convert Graphs to Text-Attributed Graphs?Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye 0001. 1412-1432 [doi]
- Forest for the Trees: Overarching Prompting Evokes High-Level Reasoning in Large Language ModelsHaoran Liao, Shaohua Hu, Zhihao Zhu, Hao He 0007, Yaohui Jin. 1433-1453 [doi]
- On the Role of Speech Data in Reducing Toxicity Detection BiasSamuel J. Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-Jussà. 1454-1468 [doi]
- ITALIC: An Italian Culture-Aware Natural Language BenchmarkAndrea Seveso, Daniele Potertì, Edoardo Federici, Mario Mezzanzanica, Fabio Mercorio. 1469-1478 [doi]
- RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language ModelsDonghao Huang, Thanh Son Nguyen, Fiona Liausvia, Zhaoxia Wang. 1479-1496 [doi]
- Improving Data Annotation for Low-Resource Relation Extraction with Logical Rule-Augmented Collaborative Language ModelsXiyang Liu 0001, Chunming Hu, Richong Zhang, Junfan Chen 0001, Baowen Xu. 1497-1510 [doi]
- CompAct: Compressed Activations for Memory-Efficient LLM TrainingYara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster. 1511-1524 [doi]
- Large Language Models Are Cross-Lingual Knowledge-Free ReasonersPeng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang. 1525-1542 [doi]
- What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt EngineeringFederico Errica, Davide Sanvito, Giuseppe Siracusano, Roberto Bifulco. 1543-1558 [doi]
- Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language ModelsDanyang Liu, Fanjie Kong, Xiaohang Sun, Dhruva Patil, Avijit Vajpayee, Zhu Liu, Vimal Bhat, Najmeh Sadoughi. 1559-1570 [doi]
- Mitigating Hallucinations in Multi-modal Large Language Models via Image Token Attention-Guided DecodingXinhao Xu, Hui Chen 0013, Mengyao Lyu, Sicheng Zhao, Yizhe Xiong, Zijia Lin, Jungong Han, Guiguang Ding. 1571-1590 [doi]
- A Multi-modal Large Language Model with Graph-of-Thought for Effective RecommendationZixuan Yi, Iadh Ounis. 1591-1606 [doi]
- Investigating Human Values in Online CommunitiesNadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein. 1607-1627 [doi]
- Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented GenerationTianyu Liu 0004, Jirui Qi, Paul He, Arianna Bisazza, Mrinmaya Sachan, Ryan Cotterell. 1628-1647 [doi]
- MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet ExtractionShaopeng Tang, Lin Li 0001, Xiaohui Tao 0001, Leqi Zhong, Qing Xie 0002. 1648-1662 [doi]
- Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-ExpertsTong Zhu 0002, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng 0001. 1663-1677 [doi]
- EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse DynamicsChenwei Wan, Matthieu Labeau, Chloé Clavel. 1678-1695 [doi]
- ReasVQA: Advancing VideoQA with Imperfect Reasoning ProcessJianxin Liang, Xiaojun Meng, Huishuai Zhang, Yueqian Wang, Jiansheng Wei, Dongyan Zhao 0001. 1696-1709 [doi]
- Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design AutomationHaoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu 0001. 1710-1721 [doi]
- A Survey of QUD Models for Discourse ProcessingYingxue Fu 0001. 1722-1732 [doi]
- SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMsZhiChao Shi, Shaoling Jing, Yi Cheng, Hao Zhang, Yuanzhuo Wang, Jie Zhang, Huawei Shen, Xueqi Cheng. 1733-1747 [doi]
- Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity TheoryHaoran Li 0003, Wei Fan, Yulin Chen, Cheng Jiayang, Tianshu Chu, Xuebing Zhou, Peizhao Hu, Yangqiu Song. 1748-1766 [doi]
- Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language ConversionZiyao Xu 0001, Houfeng Wang. 1767-1783 [doi]
- Stealthy Jailbreak Attacks on Large Language Models via Benign Data MirroringHonglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu 0049, Libo Qin 0001, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi 0002, Qingfu Zhu, Wanxiang Che. 1784-1799 [doi]
- VividMed: Vision Language Model with Versatile Visual Grounding for MedicineLingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han, Ting Chen 0006. 1800-1821 [doi]
- Mixture of Multimodal Adapters for Sentiment AnalysisKezhou Chen, Shuo Wang 0008, Huixia Ben, Shengeng Tang, Yanbin Hao. 1822-1833 [doi]
- The Impact of Inference Acceleration on Bias of LLMsElisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar. 1834-1853 [doi]
- AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African LanguagesShamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu, Paul Röttger, Abigail Oppong, Andiswa Bukula, Chiamaka Ijeoma Chukwuneke, Ebrahim Chekol Jibril, Elyas Abdi Ismail, Esubalew Alemneh, Hagos Tesfahun Gebremichael, Lukman Jibril Aliyu, Meriem Beloucif, Oumaima Hourrane, Rooweither Mabuya, Salomey Osei, Samuel Rutunda, Tadesse Destaw Belay, Tadesse Kebede Guge, Tesfa Tegegne Asfaw, Lilian Diana Awuor Wanzare, Nelson Odhiambo Onyango, Seid Muhie Yimam, Nedjma Ousidhoum. 1854-1871 [doi]
- Revealing the Barriers of Language Agents in PlanningJian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang 0033, Yikai Zhang, Lei Li 0005, Yanghua Xiao. 1872-1888 [doi]
- You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQLHideo Kobayashi, Wuwei Lan, Peng Shi 0010, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng. 1889-1901 [doi]
- Option Symbol Matters: Investigating and Mitigating Multiple-Choice Option Symbol Bias of Large Language ModelsZhen Yang, Ping Jian, Chengzhi Li. 1902-1917 [doi]
- DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context LearningXinyu Tang 0004, Xiaolei Wang 0005, Xin Zhao 0018, Ji-Rong Wen. 1918-1934 [doi]
- LLaSA: Large Language and Structured Data AssistantYao Xu, Shizhu He, Jiabei Chen, ZengXiangrong ZengXiangrong, Bingning Wang, Guang Liu, Jun Zhao, Kang Liu. 1935-1946 [doi]
- Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy LossFu-An Chao, Berlin Chen. 1947-1961 [doi]
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language ModelsAbhilasha Ravichander, Jillian Fisher, Taylor Sorensen, Ximing Lu, Maria Antoniak, Bill Yuchen Lin, Niloofar Mireshghallah, Chandra Bhagavatula, Yejin Choi 0001. 1962-1978 [doi]
- An Interpretable and Crosslingual Method for Evaluating Second-Language DialoguesRena Gao, Xuetong Wu, Carsten Roever, Jing Wu, Long Lv, Jingxuan Wu, Jey Han Lau. 1979-2008 [doi]
- From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial InjectionRupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang. 2009-2028 [doi]
- COVE: COntext and VEracity prediction for out-of-context imagesJonathan Tonglet, Gabriel Thiem, Iryna Gurevych. 2029-2049 [doi]
- Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document SummarizationYang Zhong, Diane J. Litman. 2050-2073 [doi]
- Language Models are Crossword SolversSoumadeep Saha, Sutanoya Chakraborty, Saptarshi Saha, Utpal Garain. 2074-2090 [doi]
- WHoW: A Cross-domain Approach for Analysing Conversation ModerationMing-Bin Chen, Lea Frermann, Jey Han Lau. 2091-2126 [doi]
- Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal ModelsJoan Nwatu, Oana Ignat, Rada Mihalcea. 2127-2144 [doi]
- MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL TranslationSatya Krishna Gorti, Ilan Gofman, Zhaoyan Liu, Jiapeng Wu, Noël Vouitsis, Guangwei Yu, Jesse C. Cresswell, Rasa Hosseinzadeh. 2145-2160 [doi]
- Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph EmbeddingJiang Li, Xiangdong Su, Guanglai Gao. 2161-2172 [doi]
- What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and LengthLindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao. 2173-2186 [doi]
- WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow MatchingTianze Luo, Xingchen Miao, Wenbo Duan. 2187-2198 [doi]
- Analyzing and Evaluating Correlation Measures in NLG Meta-EvaluationMingqi Gao 0002, Xinyu Hu 0001, Li Lin 0014, Xiaojun Wan 0001. 2199-2222 [doi]
- Cascading Large Language Models for Salient Event Graph GenerationXingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He 0001. 2223-2245 [doi]
- Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language ModelsArtem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov. 2246-2262 [doi]
- How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?Kenza Benkirane, Jackie Kay, María Pérez-Ortiz 0001. 2263-2288 [doi]
- From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning TasksXiaofeng Zhang, Yihao Quan, Chen Shen 0003, Xiaosong Yuan, Shaotian Yan, Liang Xie 0003, Wenxiao Wang 0001, Chaochen Gu, Hao Tang, Jieping Ye. 2289-2299 [doi]
- Patent-CR: A Dataset for Patent Claim RevisionLekang Jiang, Pascal A Scherz, Stefan Goetz. 2300-2314 [doi]
- MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEsYuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai 0002, Jianhua Lu. 2315-2328 [doi]
- Fine-Tuned LLMs are "Time Capsules" for Tracking Societal Bias Through BooksSangmitra Madhusudan, Robert Morabito, Skye Reid, Nikta Gohari Sadr, Ali Emami. 2329-2358 [doi]
- Exploring the Cost-Effectiveness of Perspective Taking in Crowdsourcing Subjective Assessment: A Case Study of Toxicity DetectionXiaoni Duan, Zhuoyan Li, Chien-Ju Ho, Ming Yin 0001. 2359-2372 [doi]
- NormAd: A Framework for Measuring the Cultural Adaptability of Large Language ModelsAbhinav Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap. 2373-2403 [doi]
- LiPO: Listwise Preference Optimization through Learning-to-RankTianqi Liu 0002, Zhen Qin 0001, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu, Xuanhui Wang. 2404-2420 [doi]
- Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias DetectionMaximilian Spliethöver, Tim Knebler, Fabian Fumagalli, Maximilian Muschalik, Barbara Hammer, Eyke Hüllermeier, Henning Wachsmuth. 2421-2449 [doi]
- Enhancing Discriminative Representation in Similar Relation Clusters for Few-Shot Continual Relation ExtractionAnh Duc Le, Nam Le Hai, Thanh Xuan Nguyen, Linh Ngo Van 0001, Nguyen Thi Ngoc Diep, Sang Dinh, Thien Huu Nguyen. 2450-2467 [doi]
- SymBa: Symbolic Backward Chaining for Structured Natural Language ReasoningJinu Lee, Wonseok Hwang. 2468-2484 [doi]
- MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context InferenceZhongwei Wan, Hui Shen, Xin Wang 0120, Che Liu, Zheda Mai, Mi Zhang 0002. 2485-2497 [doi]
- Language Models Largely Exhibit Human-like Constituent Ordering PreferencesAda Defne Tur, Gaurav Kamath, Siva Reddy. 2498-2521 [doi]
- SafeQuant: LLM Safety Analysis via Quantized Gradient InspectionSindhu Padakandla, Sadbhavana Babar, Rathod Darshan D, Manohar Kaul. 2522-2536 [doi]
- Exploring Large Language Models for Effective Rumor Detection on Social MediaYirong Zeng, Xiao Ding, Bibo Cai, Ting Liu 0001, Bing Qin 0001. 2537-2552 [doi]
- No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification TasksRyan A. Cook, John P. Lalor, Ahmed Abbasi. 2553-2573 [doi]
- NLI under the Microscope: What Atomic Hypothesis Decomposition RevealsNeha Srikanth, Rachel Rudinger. 2574-2589 [doi]
- HISTOIRESMORALES: A French Dataset for Assessing Moral AlignmentThibaud Leteno, Irina Proskurina, Antoine Gourru, Julien Velcin, Charlotte Laclau, Guillaume Metzler, Christophe Gravier. 2590-2612 [doi]
- Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation AssessmentKwangHee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe 0001, David R. Mortensen. 2613-2628 [doi]
- SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree SearchHanwen Du, Bo Peng, Xia Ning. 2629-2648 [doi]
- Reliability of Topic ModelingKayla Schroeder, Zach Wood-Doughty. 2649-2662 [doi]
- Style Transfer with Multi-iteration Preference OptimizationShuai Liu, Jonathan May. 2663-2681 [doi]
- DTELS: Towards Dynamic Granularity of Timeline SummarizationChenlong Zhang, Tong Zhou, Pengfei Cao, Zhuoran Jin, Yubo Chen 0001, Kang Liu 0001, Jun Zhao 0001. 2682-2703 [doi]
- ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation ExplanationsYichuan Li 0001, Xinyang Zhang 0002, Chenwei Zhang, Mao Li, Tianyi Liu, Pei Chen, Yifan Gao 0001, Kyumin Lee, Kaize Ding, Zhengyang Wang, Zhihan Zhang 0001, Jingbo Shang, Xian Li, Trishul Chilimbi. 2704-2719 [doi]
- DETQUS: Decomposition-Enhanced Transformers for QUery-focused SummarizationYasir Khan, Xinlei Wu, Sangpil Youm, Justin Ho, Aryaan Shaikh, Jairo Garciga, Rohan Sharma, Bonnie J. Dorr. 2720-2731 [doi]
- IrokoBench: A New Benchmark for African Languages in the Age of Large Language ModelsDavid Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba Oluwadara Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Ijeoma Chukwuneke, Happy Buzaaba, Blessing K. Sibanda, Godson Koffi Kalipe, Jonathan Mukiibi, Salomon Kabongo Kabenamualu, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Salomey Osei, Shamsuddeen Hassan Muhammad, Sokhar Samb, Tadesse Kebede Guge, Tombekai Vangoni Sherman, Pontus Stenetorp. 2732-2757 [doi]
- The Impact of Domain-Specific Terminology on Machine Translation for Finance in European LanguagesArturo Oncevay, Charese Smiley, Xiaomo Liu. 2758-2775 [doi]
- Benchmarking Language Model Creativity: A Case Study on Code GenerationYining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Sanjeev Khudanpur, Meng Jiang 0001, Daniel Khashabi. 2776-2794 [doi]
- Have LLMs Reopened the Pandora's Box of AI-Generated Fake News?Xinyu Wang, Wenbo Zhang 0008, Sai Dileep Koneru, Hangzhi Guo, Bonam Mingole, S. Shyam Sundar, Sarah Rajtmajer, Amulya Yadav. 2795-2811 [doi]
- Probe-Free Low-Rank Activation InterventionChonghe Jiang, Bao Nguyen, Anthony Man-Cho So, Viet Anh Nguyen. 2812-2824 [doi]
- FactTrack: Time-Aware World State Tracking in Story OutlinesZhiheng Lyu, Kevin Yang, Lingpeng Kong, Dan Klein. 2825-2848 [doi]
- A Bayesian Optimization Approach to Machine Translation RerankingJulius Cheng, Maike Züfle, Vilém Zouhar, Andreas Vlachos 0001. 2849-2862 [doi]
- Multi-Conditional Ranking with Large Language ModelsPouya Pezeshkpour, Estevam Hruschka. 2863-2883 [doi]
- ReGLA: Refining Gated Linear AttentionPeng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, Philippe Langlais. 2884-2898 [doi]
- Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language EncodersKshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan. 2899-2915 [doi]
- Benchmarking Failures in Tool-Augmented Language ModelsEduardo Treviño, Hugo Contant, James Ngai, Graham Neubig, Zora Zhiruo Wang. 2916-2934 [doi]
- Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition FrameworkReza Averly, Xia Ning. 2935-2951 [doi]
- Towards Knowledge Checking in Retrieval-augmented Generation: A Representation PerspectiveShenglai Zeng, Jiankun Zhang, Bingheng Li, Yuping Lin, Tianqi Zheng, Dante Everaert, Hanqing Lu, Hui Liu 0031, Yue Xing 0002, Monica Xiao Cheng, Jiliang Tang. 2952-2969 [doi]
- The Power of Many: Multi-Agent Multimodal Models for Cultural Image CaptioningLongju Bai, Angana Borah, Oana Ignat, Rada Mihalcea. 2970-2993 [doi]
- Prepending or Cross-Attention for Speech-to-Text? An Empirical ComparisonTsz Kin Lam, Marco Gaido, Sara Papi, Luisa Bentivogli, Barry Haddow. 2994-3006 [doi]
- CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-CheckingDelvin Ce Zhang, Dongwon Lee. 3007-3019 [doi]
- Racing Thoughts: Explaining Contextualization Errors in Large Language ModelsMichael A. Lepori, Michael Curtis Mozer, Asma Ghandeharioun. 3020-3036 [doi]
- DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation ModelsYimu Wang, Shuai Yuan, Bo Xue, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu. 3037-3056 [doi]
- ToW: Thoughts of Words Improve Reasoning in Large Language ModelsZhikun Xu, Ming Shen, Jacob Dineen, Zhaonan Li, Xiao-ye, Shijie Lu, Aswin RRV, Chitta Baral, Ben Zhou. 3057-3075 [doi]
- A Probabilistic Framework for LLM Hallucination Detection via Belief Tree PropagationBairu Hou, Yang Zhang, Jacob Andreas, Shiyu Chang. 3076-3099 [doi]
- ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path ErrorsQinchan Li, Sophie Hao. 3100-3111 [doi]
- Superlatives in Context: Modeling the Implicit Semantics of SuperlativesValentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty. 3112-3126 [doi]
- LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsArash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour. 3127-3140 [doi]
- Specializing Large Language Models to Simulate Survey Response Distributions for Global PopulationsYong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich. 3141-3154 [doi]
- Representing Rule-based Chatbots with TransformersDan Friedman, Abhishek Panigrahi, Danqi Chen 0001. 3155-3180 [doi]
- Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language ModelsMichael Hanna 0001, Aaron Mueller. 3181-3203 [doi]
- Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation ExtractionWilliam Hogan, Jingbo Shang. 3204-3220 [doi]
- Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language ModelsHengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi 0001, Tanuja Ganu, Hao Wang 0014. 3221-3241 [doi]
- WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global CuisinesGenta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, Cheng Ching Lam, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Christabelle Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou 0019, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh, Chong-Wah Ngo. 3242-3264 [doi]
- Extracting and Understanding the Superficial Knowledge in AlignmentRunjin Chen, Gabriel J. Perin, Xuxi Chen, Xilun Chen, Yan Han, Nina S. T. Hirata, Junyuan Hong, Bhavya Kailkhura. 3265-3280 [doi]
- Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool PlanningJunzhi Chen, Juhao Liang, Benyou Wang. 3281-3298 [doi]
- From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context LearningNan Xu, Fei Wang 0060, Sheng Zhang 0012, Hoifung Poon, Muhao Chen. 3299-3324 [doi]
- Upsample or Upweight? Balanced Training on Heavily Imbalanced DatasetsTianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi. 3325-3343 [doi]
- LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting ProblemsNan Xu, Xuezhe Ma. 3344-3370 [doi]
- PAPILLON: Privacy Preservation from Internet-based and Local Language Model EnsemblesSiyan Li, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, Zhou Yu. 3371-3390 [doi]
- When2Call: When (not) to Call ToolsHayley Ross, Ameya Sunil Mahabaleshwarkar, Yoshi Suhara. 3391-3409 [doi]
- Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference OptimizationZilu Tang, Rajen Chatterjee, Sarthak Garg. 3410-3433 [doi]
- Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification ToolsYilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan. 3434-3483 [doi]
- Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal. 3484-3498 [doi]
- Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with DecontextualizationJin Zhao, Jingxuan Tu, Bingyang Ye, Xinrui Hu, Nianwen Xue, James Pustejovsky. 3499-3513 [doi]
- Can Unconfident LLM Annotations Be Used for Confident Conclusions?Kristina Gligoric, Tijana Zrnic, Cinoo Lee, Emmanuel J. Candès, Dan Jurafsky. 3514-3533 [doi]
- Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart UnderstandingJunyi Ye, Ankan Dash, Wenpeng Yin 0001, Guiling Wang 0001. 3534-3548 [doi]
- Ihquin tlahtouah in Tetelahtzincocah: An annotated, multi-purpose audio and text corpus of Western Sierra Puebla NahuatlRobert Pugh, Cheyenne Wing, María Ximena Juárez Huerta, Angeles Márquez Hernandez, Francis M. Tyers. 3549-3562 [doi]
- Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsHanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze. 3563-3599 [doi]
- Unfamiliar Finetuning Examples Control How Language Models HallucinateKatie Kang, Eric Wallace, Claire J. Tomlin, Aviral Kumar, Sergey Levine. 3600-3612 [doi]
- Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM SamplingGuangya Wan, Yuqi Wu, Jie Chen, Sheng Li 0001. 3613-3635 [doi]
- MatViX: Multimodal Information Extraction from Visually Rich ArticlesGhazal Khalighinejad, Sharon Scott, Ollie Liu, Kelly L. Anderson, Rickard Stureborg, Aman Tyagi, Bhuwan Dhingra. 3636-3655 [doi]
- Towards Rationality in Language and Multimodal Agents: A SurveyBowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J. Su, Camillo Jose Taylor, Tanwi Mallick. 3656-3675 [doi]
- CluSanT: Differentially Private and Semantically Coherent Text SanitizationAhmed Musa Awon, Yun Lu, Shera Potka, Alex Thomo. 3676-3693 [doi]
- TurkingBench: A Challenge Benchmark for Web AgentsKevin Xu, Yeganeh Kordi, Tanay Nayak, Adi Asija, Yizhong Wang, Kate Sanders 0002, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi. 3694-3710 [doi]
- CodeTree: Agent-guided Tree Search for Code Generation with Large Language ModelsJierui Li, Hung Le 0003, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Doyen Sahoo. 3711-3726 [doi]
- DPL: Diverse Preference Learning Without A Reference ModelAbhijnan Nath, Andrey Volozin, Saumajit Saha, Albert Nanda, Galina Grunin, Rahul Bhotika, Nikhil Krishnaswamy. 3727-3747 [doi]
- Verifiable by Design: Aligning Language Models to Quote from Pre-Training DataJingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi. 3748-3768 [doi]
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal ModelsZejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Xuanjing Huang 0001, Zhongyu Wei. 3769-3798 [doi]
- ACCORD: Closing the Commonsense Measurability GapFrançois Roewer-Després, Jinyue Feng, Zining Zhu 0001, Frank Rudzicz. 3799-3829 [doi]
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic EnvironmentsKung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang 0016, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu. 3830-3850 [doi]
- Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space ModelsJuan Pablo Muñoz, Jinjie Yuan, Nilesh Jain. 3851-3863 [doi]
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior TherapyMian Zhang, Xianjun Yang, Xinlu Zhang, Travis Labrum, Jamie C. Chiu, Shaun M. Eack, Fei Fang 0001, William Yang Wang, Zhiyu Chen 0002. 3864-3900 [doi]
- An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMsEui Jun Hwang, Sukmin Cho, Junmyeong Lee, Jong C. Park. 3901-3920 [doi]
- Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design PrototypingRyan Li, Yanzhe Zhang, Diyi Yang. 3921-3955 [doi]
- Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End EngineeringChenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang. 3956-3974 [doi]
- Temporal-Aware Soft Prompt Tuning for Automatic Text DatingHai Wang, Yuzhi Liang, Han Ren. 3975-3987 [doi]
- Sparser Mixture-of-Adapters with Cross-Layer GeneralizationZiyue Li, Tianyi Zhou. 3988-4002 [doi]
- How to Align Multiple Signed Language Corpora for Better Sign-to-Sign Translations?Mert Inan, Yang Zhong, Vidya Ganesh, Malihe Alikhani. 4003-4016 [doi]
- Communication Makes Perfect: Persuasion Dataset Construction via Multi-LLM CommunicationWeicheng Ma, Hefan Zhang, Ivory Yang, Shiyu Ji, Joice Chen, Farnoosh Hashemi, Shubham Mohole, Ethan Gearey, Michael Macy, Saeed Hassanpour, Soroush Vosoughi. 4017-4045 [doi]
- Soft Prompting for Unlearning in Large Language ModelsKaruna Bhaila, Minh-Hao Van, Xintao Wu. 4046-4056 [doi]
- Mutual-pairing Data Augmentation for Fewshot Continual Relation ExtractionNguyen Hoang Anh, Quyen Tran, Thanh Xuan Nguyen, Nguyen Thi Ngoc Diep, Linh Ngo Van 0001, Thien Huu Nguyen, Trung Le 0001. 4057-4075 [doi]
- KMMLU: Measuring Massive Multitask Language Understanding in KoreanGuijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman. 4076-4104 [doi]
- Protecting Privacy in Multimodal Large Language Models with MLLMU-BenchZheyuan Liu 0010, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng 0001, Yongle Yuan, Meng Jiang 0001. 4105-4135 [doi]
- LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network ReconfigurationPanayiotis Christou, Md. Zahidul Islam, Yuzhang Lin, Jingwei Xiong. 4136-4155 [doi]
- WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and RobustnessBaizhou Huang, Xiaojun Wan 0001. 4156-4182 [doi]
- Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning AttackCheng Wang, Yiwei Wang 0001, Yujun Cai, Bryan Hooi. 4183-4194 [doi]
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-DeterminismYifan Song 0002, Guoyin Wang, Sujian Li, Bill Yuchen Lin. 4195-4206 [doi]
- CVE-Bench: Benchmarking LLM-based Software Engineering Agent's Ability to Repair Real-World CVE VulnerabilitiesPeiran Wang, Xiaogeng Liu, Chaowei Xiao. 4207-4224 [doi]
- PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model PipelinesReya Vir, Shreya Shankar, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran. 4225-4245 [doi]
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue SynthesisZezhong Wang 0004, Xingshan Zeng, Weiwen Liu, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang 0002, Qun Liu 0001, Kam-Fai Wong. 4246-4263 [doi]
- Fighting Spurious Correlations in Text Classification via a Causal Learning PerspectiveYuqing Zhou, Ziwei Zhu 0001. 4264-4274 [doi]
- Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational RetrievalYu Xia 0007, Junda Wu, SungChul Kim, Tong Yu 0001, Ryan A. Rossi, Haoliang Wang, Julian J. McAuley. 4275-4286 [doi]
- SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model CompressionXin Wang 0120, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang 0002. 4287-4296 [doi]
- AudioBench: A Universal Benchmark for Audio Large Language ModelsBin Wang 0040, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen. 4297-4316 [doi]
- Efficient Prompting for Continual Adaptation to Missing ModalitiesZirun Guo, Shulei Wang, Wang Lin, Weicai Yan, Yangyang Wu, Tao Jin 0004. 4317-4327 [doi]
- Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5Arkadeep Acharya, Rudra Murthy, Vishwajeet Kumar, Jaydeep Sen. 4328-4348 [doi]
- Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph CompletionMuzhi Li, Cehao Yang, Chengjin Xu, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-Fung Leung, Irwin King. 4349-4363 [doi]
- See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality BiasJunehyoung Kwon, Mihyeon Kim, Eunju Lee 0003, Juhwan Choi, Youngbin Kim. 4364-4378 [doi]
- Harnessing and Evaluating the Intrinsic Extrapolation Ability of Large Language Models for Vehicle Trajectory PredictionJiawei Liu, Yanjiao Liu, Xun Gong 0007, Tingting Wang, Hong Chen 0003, Yunfeng Hu. 4379-4391 [doi]
- Stronger Models are Not Always Stronger Teachers for Instruction TuningZhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran. 4392-4405 [doi]
- Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer ProductPengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, Xingwei Wang 0001. 4406-4421 [doi]
- Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within PacksJiancheng Dong, Lei Jiang, Wei Jin, Lu Cheng. 4422-4435 [doi]
- Transferable Post-training via Inverse Value LearningXinyu Lu, Xueru Wen, Yaojie Lu 0001, Bowen Yu 0002, Hongyu Lin, Haiyang Yu 0003, Le Sun 0001, Xianpei Han, Yongbin Li. 4436-4447 [doi]
- FLEX: Expert-level False-Less EXecution Metric for Text-to-SQL BenchmarkHeegyu Kim, Taeyang Jeon, Seunghwan Choi, Seungtaek Choi, Hyunsouk Cho. 4448-4475 [doi]
- AID: Adaptive Integration of Detectors for Safe AI with Language ModelsXinran Wang, Enmao Diao, Qi Le, Jie Ding 0002, Ali Anwar 0001. 4476-4492 [doi]
- SSMLoRA: Enhancing Low-Rank Adaptation with State Space ModelJiayang Yu, Yihang Zhang, Bin Wang, Peiqin Lin, Yongkang Liu, Shi Feng. 4493-4506 [doi]
- Sharpness-Aware Minimization for Topic Models with High-Quality Document RepresentationsTung Nguyen, Tue Le, Hoang Tran Vuong, Quang Duc Nguyen, Duc Anh Nguyen, Linh Ngo Van 0001, Sang Dinh, Thien Huu Nguyen. 4507-4524 [doi]
- C²: Scalable Auto-Feedback for LLM-based Chart GenerationWoosung Koh, Jang Han Yoon, Minhyung Lee, YoungJin Song, Jaegwan Cho, Jaehyun Kang, Taehyeon Kim 0001, Se-Young Yun, Youngjae Yu, Bongshin Lee. 4525-4566 [doi]
- A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Case Study of Supplementary AdverbsZhu Liu 0005, Cunliang Kong, Ying Liu, Maosong Sun 0001. 4567-4576 [doi]
- UniHGKR: Unified Instruction-aware Heterogeneous Knowledge RetrieversDehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You. 4577-4594 [doi]
- Improving Model Evaluation using SMART Filtering of Benchmark DatasetsVipul Gupta, Candace Ross, David Pantoja, Rebecca J. Passonneau, Megan Ung, Adina Williams. 4595-4615 [doi]
- Entropy-Based Decoding for Retrieval-Augmented Large Language ModelsZexuan Qiu, Zijing Ou, Bin Wu 0025, Jingjing Li 0007, Aiwei Liu, Irwin King. 4616-4627 [doi]
- What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language ModelsShengqi Zhu 0002, Jeffrey Rzeszotarski. 4628-4646 [doi]
- Diversity Helps Jailbreak Large Language ModelsWeiliang Zhao, Daniel Ben-Levi, Wei Hao, Junfeng Yang, Chengzhi Mao. 4647-4680 [doi]
- Constrained Decoding with Speculative LookaheadsNishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah. 4681-4700 [doi]
- DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech RecognitionWonjun Lee, Solee Im, Heejin Do, Yunsu Kim 0001, Jungseul Ok, Gary Lee 0001. 4701-4712 [doi]
- Revisiting Early Detection of Sexual Predators via Turn-level OptimizationJinmyeong An, Sangwon Ryu, Heejin Do, Yunsu Kim 0001, Jungseul Ok, Gary Lee 0001. 4713-4724 [doi]
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionYinghao Aaron Li, Xilin Jiang, Cong Han, Nima Mesgarani. 4725-4744 [doi]
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented GenerationSatyapriya Krishna, Kalpesh Krishna, Anhad Mohananey, Steven Schwarcz, Adam Stambler, Shyam Upadhyay, Manaal Faruqui. 4745-4759 [doi]
- ReachAgent: Enhancing Mobile Agent via Page Reaching and OperationQinzhuo Wu, Wei Liu 0005, Jian Luan 0001, Bin Wang 0004. 4760-4775 [doi]
- Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs GeneratorChengyuan Liu, Shihang Wang, Lizhi Qing, Jun Lin, Ji Zhang, Fei Wu 0001, Kun Kuang. 4776-4791 [doi]
- SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity MixtureJiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Yuanfang Zhang, Weibo Zheng, Donghong Han. 4792-4804 [doi]
- MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationJinsheng Huang, Liang Chen 0024, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan 0016, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu 0001, Baobao Chang, Ming Zhang 0004. 4805-4822 [doi]
- MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningHanqing Wang 0003, Yixia Li, Shuo Wang, Guanhua Chen 0001, Yun Chen 0007. 4823-4836 [doi]
- Analyzing (In)Abilities of SAEs via Formal LanguagesAbhinav Menon, Manish Shrivastava 0001, David Krueger 0001, Ekdeep Singh Lubana. 4837-4862 [doi]
- Multimodal Cognitive Reframing Therapy via Multi-hop Psychotherapeutic ReasoningSubin Kim, Hoonrae Kim, Heejin Do, Gary Lee 0001. 4863-4880 [doi]
- Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error CorrectionWei Li 0101, Wen Luo 0001, Guangyue Peng, Houfeng Wang. 4881-4897 [doi]
- A Unified Supervised and Unsupervised Dialogue Topic Segmentation Framework Based on Utterance Pair ModelingShihao Yang, Ziyi Zhang, Yue Jiang, Chunsheng Qin, Shuhua Liu. 4898-4908 [doi]
- Evaluating Small Language Models for News Summarization: Implications and Factors Influencing PerformanceBorui Xu, Yao Chen 0008, Zeyi Wen, Weiguo Liu, Bingsheng He. 4909-4922 [doi]
- Dynamic Fisher-weighted Model Merging via Bayesian OptimizationSanwoo Lee, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Yunfang Wu. 4923-4935 [doi]
- AI-Assisted Human Evaluation of Machine TranslationVilém Zouhar, Tom Kocmi, Mrinmaya Sachan. 4936-4950 [doi]
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample CriteriaWentao Ge, Shunian Chen, Hardy Chen, Nuo Chen 0002, Junying Chen, Zhihong Chen, Wenya Xie, Shuo Yan, ChenghaoZhu ChenghaoZhu, Ziyue Lin, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhiyi Zhang 0007, Jianquan Li, Xiang Wan, Benyou Wang. 4951-4974 [doi]
- AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosXinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang 0001, Zhongyu Wei. 4975-5001 [doi]
- FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop DataDeren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu Wang, Emily Ching, Alex Deng. 5002-5020 [doi]
- Label Drop for Multi-Aspect Relation Modeling in Universal Information ExtractionLu Yang 0008, Jiajia Li, En Ci, Lefei Zhang, Zuchao Li, Ping Wang 0028. 5021-5040 [doi]
- Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet ExtractionDongming Sheng, Kexin Han, Hao Li, Yan Zhang, Yucheng Huang, Jun Lang, Wenqiang Liu. 5041-5053 [doi]
- VisCGEC: Benchmarking the Visual Chinese Grammatical Error CorrectionXiaoman Wang, Dan Yuan, Xin Liu, Yike Zhao, Xiaoxiao Zhang, Xizhi Chen, Yunshi Lan. 5054-5068 [doi]
- Are We Done with MMLU?Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao 0043, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini. 5069-5096 [doi]
- MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool CallingYakun Zhu, Shaohang Wei, Xu Wang, Kui Xue, Shaoting Zhang 0001, Xiaofan Zhang 0002. 5097-5116 [doi]
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation EngineeringYu Zhao 0043, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru Wang 0003, Xuanli He, Kam-Fai Wong, Pasquale Minervini. 5117-5136 [doi]
- MoDification: Mixture of Depths Made EasyChen Zhang 0020, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao 0017, Yao Hu 0002, Kehai Chen, Min Zhang 0005, Dawei Song 0001. 5137-5149 [doi]
- On the Vulnerability of Text SanitizationMeng Tong, Kejiang Chen, Xiaojian Yuan, Jiayang Liu, Weiming Zhang 0001, Nenghai Yu, Jie Zhang 0073. 5150-5164 [doi]
- Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language ModelsAmey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty 0002. 5165-5180 [doi]
- Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph RepresentationHoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui. 5181-5197 [doi]
- Exploring the Potential of Large Language Models for Heterophilic GraphsYuxia Wu, Shujie Li 0003, Yuan Fang 0001, Chuan Shi 0001. 5198-5211 [doi]
- Exploiting Edited Large Language Models as General Scientific OptimizersQitan Lv, Tianyu Liu, Hong Wang. 5212-5237 [doi]
- DIRAS: Efficient LLM Annotation of Document Relevance for Retrieval Augmented GenerationJingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold. 5238-5258 [doi]
- Hello Again! LLM-powered Personalized Agent for Long-term DialogueHao Li, Chenghao Yang, An Zhang 0003, Yang Deng 0002, Xiang Wang 0010, Tat-Seng Chua. 5259-5276 [doi]
- My LLM might Mimic AAE - But When Should It?Sandra Sandoval, Christabel Acquaye, Kwesi A. Cobbina, Mohammad Nayeem Teli, Hal Daumé III. 5277-5302 [doi]
- High-Dimension Human Value Representation in Large Language ModelsSamuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung. 5303-5330 [doi]
- Not all Hallucinations are Good to Throw Away When it Comes to Legal Abstractive SummarizationNihed Bendahman, Karen Pinel-Sauvagnat, Gilles Hubert 0001, Mokhtar Boumedyen Billami. 5331-5344 [doi]
- Query-focused Referentiability Learning for Zero-shot RetrievalJaeyoung Kim, Dohyeon Lee, Seung-won Hwang. 5345-5358 [doi]
- A Novel Computational Modeling Foundation for Automatic Coherence AssessmentAviya Maimon. 5359-5377 [doi]
- Token-based Decision Criteria Are Suboptimal in In-context LearningHakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue. 5378-5401 [doi]
- CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMsAmey Hengle, Aswini Kumar Padhi, Anil Bandhakavi, Tanmoy Chakraborty 0002. 5402-5419 [doi]
- Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical StudyMenglong Cui, Pengzhi Gao, Wei Liu 0005, Jian Luan 0001, Bin Wang 0004. 5420-5443 [doi]
- RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language ModelsBang An, Shiyue Zhang, Mark Dredze. 5444-5474 [doi]
- Evaluating Evidence Attribution in Generated Fact Checking ExplanationsRui Xing, Timothy Baldwin, Jey Han Lau. 5475-5496 [doi]
- ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information CoverageTaewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang. 5497-5512 [doi]
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' PosteriorsGeorgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan. 5513-5528 [doi]
- Arabic Dataset for LLM Safeguard EvaluationYasser Ashraf, Yuxia Wang, Bin Gu, Preslav Nakov, Timothy Baldwin. 5529-5546 [doi]
- Anticipating Future with Large Language Model for Simultaneous Machine TranslationSiqi Ouyang, Oleksii Hrinchuk, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Lei Li 0005, Boris Ginsburg. 5547-5557 [doi]
- GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography InterviewingJinhao Duan, Xinyu Zhao, Zhuoxuan Zhang, Eunhye Grace Ko, Lily Boddy, Chenan Wang, Tianhao Li, Alexander Rasgon, Junyuan Hong, Min Kyung Lee, Chenxi Yuan, Qi Long, Ying Ding 0001, Tianlong Chen, Kaidi Xu. 5558-5588 [doi]
- Fine-Tuning Large Language Models with Sequential InstructionsHanxu Hu, Simon Yu, Pinzhen Chen, Edoardo M. Ponti. 5589-5610 [doi]
- Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic ParsingMayank Kothyari, Sunita Sarawagi, Soumen Chakrabarti, Gaurav Arora, Srujana Merugu. 5611-5629 [doi]
- Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal ReasoningRujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, Xiaozhong Liu. 5630-5642 [doi]
- Efficient One-shot Compression via Low-Rank Local Feature DistillationYaya Sy, Christophe Cerisara, Irina Illina. 5643-5661 [doi]
- Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language GenerationDamien de Mijolla, Hannan Saddiq, Kim Moore. 5662-5686 [doi]
- ConQRet: A New Benchmark for Fine-Grained Automatic Evaluation of Retrieval Augmented Computational ArgumentationKaustubh D. Dhole, Kai Shu, Eugene Agichtein. 5687-5713 [doi]
- SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data AnnotatorsDaniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Elena Tutubalina, Alexander Panchenko. 5714-5733 [doi]
- BEMEAE: Moving Beyond Exact Span Match for Event Argument ExtractionEnfa Fane, Md Nayem Uddin, Oghenevovwe Ikumariegbe, Daniyal Kashif, Eduardo Blanco 0002, Steven R. Corman. 5734-5749 [doi]
- uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data RegimesAbdul Waheed, Karima Kadaoui, Bhiksha Raj, Muhammad Abdul-Mageed. 5750-5767 [doi]
- Iterative Self-Tuning LLMs for Enhanced Jailbreaking CapabilitiesChung-En Sun, Xiaodong Liu 0003, Weiwei Yang, Tsui-Wei Weng, Hao Cheng 0002, Aidan San, Michel Galley, Jianfeng Gao 0001. 5768-5786 [doi]
- VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningYifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang 0012, Kunal Dhawan, Ke Hu, Shinji Watanabe 0001, Jagadeesh Balam, Boris Ginsburg. 5787-5802 [doi]
- Rethinking Word Similarity: Semantic Similarity through Classification ConfusionKaitlyn Zhou, Haishan Gao, Sarah Li Chen, Dan Edelstein, Dan Jurafsky, Chen Shani. 5803-5817 [doi]
- SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QAVenktesh V, Mandeep Rathee, Avishek Anand. 5818-5835 [doi]
- Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question CoverageKaige Xie, Philippe Laban, Prafulla Kumar Choubey, Caiming Xiong, Chien-Sheng Wu. 5836-5849 [doi]
- Stronger Universal and Transferable Attacks by Suppressing RefusalsDavid Huang, Avidan Shah, Alexandre Araujo, David A. Wagner 0001, Chawin Sitawarin. 5850-5876 [doi]
- The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsSeungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Choi 0001, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang, Seonghyeon Ye, Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee 0002, Minjoon Seo. 5877-5919 [doi]
- DreamSync: Aligning Text-to-Image Generation with Image Understanding FeedbackJiao Sun, Deqing Fu, Yushi Hu, Su Wang 0001, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian. 5920-5945 [doi]
- Uncovering Bias in Large Vision-Language Models at Scale with CounterfactualsPhillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko. 5946-5991 [doi]
- AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM GuardrailsShaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien. 5992-6026 [doi]
- UOREX: Towards Uncertainty-Aware Open Relation ExtractionRebii Jamal, Mounir Ourekouch, Mohammed Erradi. 6027-6040 [doi]
- Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-TrainingYuchen Zhuang, Jingfeng Yang 0001, Haoming Jiang, Xin Liu 0039, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao 0001, Qing-ping, Tianyi Liu, Binxuan Huang, Zheng Li 0018, Zhengyang Wang, Pei Chen, Ruijie Wang 0004, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang 0014. 6041-6068 [doi]
- TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-ReflectionShengmin Piao, Sanghyun Park 0003. 6069-6087 [doi]
- VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented GenerationManan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha. 6088-6109 [doi]
- VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark ModelsMing Cheng, Jiaying Gong, Chenhan Yuan, William A. Ingram, Edward A. Fox, Hoda Eldardiry. 6110-6130 [doi]
- Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse LanguagesJannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller. 6131-6150 [doi]
- Examining and Adapting Time for Multilingual Classification via Mixture of Temporal ExpertsWeisi Liu, Guangzeng Han, Xiaolei Huang 0002. 6151-6166 [doi]
- FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask EvaluationGarrett Tanzer. 6167-6191 [doi]
- EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary AlgorithmsSiyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan 0003, Dongsheng Li 0002, Deqing Yang. 6192-6217 [doi]
- EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in DialoguesQiming Feng, Qiujie Xie, Xiaolong Wang, Qingqiu Li, Yuejie Zhang, Rui Feng 0001, Tao Zhang 0022, Shang Gao 0003. 6218-6240 [doi]
- Language Models can Categorize System Inputs for Performance AnalysisDominic Sobhani, Ruiqi Zhong, Edison Marrese-Taylor, Keisuke Sakaguchi, Yutaka Matsuo. 6241-6257 [doi]
- FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language ModelsXin Guo, Haotian Xia, Zhaowei Liu, Hanyang Cao, Zhi Yang, Zhiqiang Liu, Sizhe Wang, Jinyi Niu, Chuqi Wang, Yanhui Wang, Xiaolong Liang, Xiaoming Huang, Bing Zhu, Zhongyu Wei, Yun Chen, Weining Shen, Liwen Zhang. 6258-6292 [doi]
- Rethinking the Role of LLMs for Document-level Relation Extraction: a Refiner with Task Distribution and Probability FusionFu Zhang, Xinlong Jin, Jingwei Cheng, Hongsen Yu, Huangming Xu. 6293-6312 [doi]
- Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?Qisheng Hu, Quanyu Long, Wenya Wang. 6313-6336 [doi]
- Model Surgery: Modulating LLM's Behavior Via Simple Parameter EditingHuanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang 0001. 6337-6357 [doi]
- Effective Skill Unlearning through Intervention and AbstentionYongce Li, Chung-En Sun, Tsui-Wei Weng. 6358-6371 [doi]
- CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual WorldsLei Wang 0198, Jianxun Lian, Yi Huang, Yanqi Dai, Haoxuan Li 0001, Xu Chen 0017, Xing Xie 0001, Ji-Rong Wen. 6372-6391 [doi]
- A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language ModelsXiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen. 6392-6409 [doi]
- CoME: An Unlearning-based Approach to Conflict-free Model EditingDahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, HeuiSeok Lim. 6410-6422 [doi]
- On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic PhenomenaTarek Naous, Wei Xu. 6423-6443 [doi]
- Adapting Sentence-level Automatic Metrics for Document-level Simplification EvaluationMounica Maddela, Fernando Alva-Manchego. 6444-6459 [doi]
- Decoding Speculative DecodingMinghao Yan, Saurabh Agarwal, Shivaram Venkataraman. 6460-6473 [doi]
- Leveraging LLM For Synchronizing Information Across Multilingual TablesSiddharth Khincha, Tushar Kataria, Ankita Anand, Dan Roth, Vivek Gupta 0001. 6474-6492 [doi]
- ConMeC: A Dataset for Metonymy Resolution with Common NounsSaptarshi Ghosh, Tianyu Jiang. 6493-6509 [doi]
- Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown QuestionsHongru Wang 0003, Boyang Xue, Baohang Zhou, Tianhua Zhang, Cunxiang Wang, Huimin Wang, Guanhua Chen 0001, Kam-Fai Wong. 6510-6525 [doi]
- TRANSIENTTABLES: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured TablesAbhilash Reddy Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta 0001. 6526-6544 [doi]
- AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective IntelligenceMinbeom Kim, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung. 6545-6565 [doi]
- tRAG: Term-level Retrieval-Augmented Generation for Domain-Adaptive RetrievalDohyeon Lee, Jongyoon Kim, Jihyuk Kim, Seung-won Hwang, Joonsuk Park. 6566-6578 [doi]
- JRE-L: Journalist, Reader, and Editor LLMs in the Loop for Science Journalism for the General AudienceGongyao Jiang, Xinran Shi, Qiong Luo 0001. 6579-6594 [doi]
- Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language ModelsZiche Liu, Rui Ke, Yajiao Liu, Feng Jiang 0007, Haizhou Li 0001. 6595-6611 [doi]
- Graph Neural Network Enhanced Retrieval for Question Answering of Large Language ModelsZijian Li 0002, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian 0002, Jun Zhang 0004, Rui Wang 0028. 6612-6633 [doi]
- Pula: Training Large Language Models for SetswanaNathan Brown, Vukosi Marivate. 6634-6656 [doi]
- LegalViz: Legal Text Visualization by Text To Diagram GenerationEri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita. 6657-6676 [doi]
- Active Few-Shot Learning for Text ClassificationSaeed Ahmadnia, Arash Yousefi Jordehi, Mahsa Hosseini Khasheh Heyran, Seyed Abolghasem Mirroshandel, Owen Rambow, Cornelia Caragea. 6677-6694 [doi]
- Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual AugmentationCong-Duy T. Nguyen, Xiaobao Wu, Thong Thanh Nguyen, Shuai Zhao 0007, Khoi M. Le, Viet Anh Nguyen, Yichao Feng, Anh Tuan Luu. 6695-6708 [doi]
- ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language ModelsJinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang. 6709-6738 [doi]
- Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context LearningZixiao Zhu, Zijian Feng, Hanzhang Zhou, Junlang Qian, Kezhi Mao. 6739-6759 [doi]
- Identifying Emerging Concepts in Large CorporaSibo Ma, Julian Nyarko. 6760-6778 [doi]
- CodeSCM: Causal Analysis for Multi-Modal Code GenerationMukur Gupta, Noopur Bhatt, Suman Jana. 6779-6793 [doi]
- From Distributional to Overton Pluralism: Investigating Large Language Model AlignmentThom Lake, Eunsol Choi, Greg Durrett. 6794-6814 [doi]
- Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism DesignMohan Zhang, Pingzhi Li, Jie Peng, Mufan Qiu, Tianlong Chen. 6815-6825 [doi]
- LibEvolutionEval: A Benchmark and Study for Version-Specific Code GenerationSachit Kuhar, Wasi Uddin Ahmad, Zijian Wang 0002, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma 0001, Anoop Deoras. 6826-6840 [doi]
- Evaluating and Mitigating Object Hallucination in Large Vision-Language Models: Can They Still See Removed Objects?Yixiao He, Haifeng Sun 0001, Pengfei Ren, Jingyu Wang 0001, Huazheng Wang, Qi Qi 0001, Zirui Zhuang, Jing Wang 0039. 6841-6858 [doi]
- Self-Pluralising Culture Alignment for Large Language ModelsShaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong. 6859-6877 [doi]
- K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected CompressorJeonghun Cho, Gary Lee 0001. 6878-6901 [doi]
- DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math ImagesSami Baral, Li Lucy, Ryan Knight, Alice Ng, Luca Soldaini, Neil T. Heffernan, Kyle Lo. 6902-6920 [doi]
- Knowledge Graph Guided Evaluation of Abstention TechniquesKinshuk Vasisht, Navreet Kaur 0002, Danish Pruthi. 6921-6939 [doi]
- Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMsKeqi Deng, Guangzhi Sun, Philip C. Woodland. 6940-6956 [doi]
- Legal Judgment Prediction based on Knowledge-enhanced Multi-Task and Multi-Label Text ClassificationAng Li, Yiquan Wu, Ming Cai, Adam Jatowt, Xiang Zhou, Weiming Lu 0001, Changlong Sun, Fei Wu 0001, Kun Kuang. 6957-6970 [doi]
- SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based AgentKeyeun Lee, SeoHyeong Kim, Seolhee Lee, Jinsu Eun, Yena Ko, Hayeon Jeon, Esther Hehsun Kim, Seonghye Cho, Soeun Yang, Eun Mee Kim, Hajin Lim. 6971-6991 [doi]
- Beemo: Benchmark of Expert-edited Machine-generated OutputsEkaterina Artemova, Jason Samuel Lucas, Saranya Venkatraman, Jooyoung Lee, Sergei Tilga, Adaku Uchendu, Vladislav Mikhailov. 6992-7018 [doi]
- SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad HocDaniel Guzman-Olivares, Lara Quijano Sánchez, Federico Liberatore. 7019-7033 [doi]
- Towards Automatic Evaluation for Image TranscreationSimran Khanuja, Vivek Iyer, Xiaoyu He, Graham Neubig. 7034-7047 [doi]
- ImgTrojan: Jailbreaking Vision-Language Models with ONE ImageXijia Tao, Shuai Zhong, Lei Li 0039, Qi Liu 0049, Lingpeng Kong. 7048-7063 [doi]
- RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and RefinementJinhao Jiang, Jiayi Chen 0005, Junyi Li, Ruiyang Ren, Shijie Wang, Xin Zhao 0018, Yang Song 0021, Tao Zhang 0070. 7064-7074 [doi]
- Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented CalibrationAng Li, Jingqian Zhao, Bin Liang 0004, Lin Gui 0003, Hui Wang 0030, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu 0001. 7075-7092 [doi]
- Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token PredictionJunlang Qian, Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Zepeng Zhai, Kezhi Mao. 7093-7115 [doi]
- Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components AnalysisDonglei Yu, Xiaomian Kang, Yuchen Liu 0007, Feifei Zhai, Nanchang Cheng, Yu Zhou 0001, Chengqing Zong. 7116-7131 [doi]
- Markov Chain of Thought for Efficient Mathematical ReasoningWen Yang, Minpeng Liao, Kai Fan 0002. 7132-7157 [doi]
- Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation ModelsVarun Gumma, Pranjal A. Chitale, Kalika Bali. 7158-7170 [doi]
- Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity ProjectionKoji Inoue, Divesh Lala, Gabriel Skantze, Tatsuya Kawahara. 7171-7181 [doi]
- Prompt Compression for Large Language Models: A SurveyZongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier. 7182-7195 [doi]
- Goal-Conditioned DPO: Prioritizing Safety in Misaligned InstructionsJoo Bon Maeng, Seongmin Lee 0011, Seokin Seo, Kee-Eung Kim. 7196-7211 [doi]
- K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic ReasoningYadong Zhang, Shaoguang Mao, Tao Ge 0001, Xun Wang 0012, Yan Xia 0005, Man Lan, Furu Wei. 7212-7234 [doi]
- SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic ReasoningMagdalena Wysocka, Danilo S. Carvalho, Oskar Wysocki, Marco Valentino, André Freitas. 7235-7258 [doi]
- The State and Fate of Summarization Datasets: A SurveyNoam Dahan, Gabriel Stanovsky. 7259-7278 [doi]
- MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News MediaMuhammad Arslan Manzoor, Ruihong Zeng, Dilshod Azizov, Preslav Nakov, Shangsong Liang. 7279-7295 [doi]
- A Logical Fallacy-Informed Framework for Argument GenerationLuca Mouchel, Debjit Paul, Shaobo Cui 0006, Robert West 0001, Antoine Bosselut, Boi Faltings. 7296-7314 [doi]
- LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree SearchDi Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li 0003, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone 0001, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou. 7315-7337 [doi]
- Generative Prompt InternalizationHaebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo. 7338-7363 [doi]
- Script-Agnosticism and its Impact on Language Identification for Dravidian LanguagesMilind Agarwal, Joshua Otten, Antonios Anastasopoulos. 7364-7384 [doi]
- NAT: Enhancing Agent Tuning with Negative SamplesRenxi Wang, Xudong Han, Yixuan Zhang, Timothy Baldwin, Haonan Li 0002. 7385-7398 [doi]
- Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve AnomaliesZirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang 0006, Shiyu Jiang, Miao Fang 0001, Ling Chen 0006, Xiuying Chen. 7399-7415 [doi]
- How to Make the Most of LLMs' Grammatical Knowledge for Acceptability JudgmentsYusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai 0010, Hidetaka Kamigaito, Taro Watanabe. 7416-7432 [doi]
- Is Your LLM Outdated? A Deep Look at Temporal GeneralizationChenghaoZhu ChenghaoZhu, Nuo Chen 0002, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang. 7433-7457 [doi]
- Towards a Perspectivist Turn in Argument Quality AssessmentJulia Romberg, Maximilian Maurer, Henning Wachsmuth, Gabriella Lapesa. 7458-7485 [doi]
- A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via VisualizationHaoxin Liu 0001, Chenghao Liu, B. Aditya Prakash. 7486-7518 [doi]
- PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and DetectionJooyoung Lee, Toshini Agrawal, Adaku Uchendu, Thai Le, Jinghui Chen, Dongwon Lee 0001. 7519-7534 [doi]
- Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor RecognitionHaohao Zhu, Xiaokun Zhang 0001, Zeyuan Zeng, Junyu Lu, Zewen Bai, Liang Yang 0003, Hongfei Lin. 7535-7547 [doi]
- CAST: Corpus-Aware Self-similarity Enhanced Topic modellingYanan Ma, Chenghao Xiao, Chenhan Yuan, Sabine N. van der Veer, Lamiece Hassan, Chenghua Lin, Goran Nenadic. 7548-7561 [doi]
- A Zero-Shot Open-Vocabulary Pipeline for Dialogue UnderstandingAbdulfattah Safa, Gözde Gül Sahin. 7562-7579 [doi]
- Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language ModelsSomnath Banerjee 0002, Sayan Layek, Hari Shrawgi, Rajarshi Mandal, Avik Halder, Shanu Kumar, Sagnik Basu, Parag Agrawal, Rima Hazra, Animesh Mukherjee 0001. 7580-7617 [doi]
- Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I ModelsMichael Toker, Ido Galil, Hadas Orgad, Rinon Gal, Yoad Tewel, Gal Chechik, Yonatan Belinkov. 7618-7632 [doi]
- In-Context Learning (and Unlearning) of Length BiasesStephanie Schoch, Yangfeng Ji. 7633-7671 [doi]
- AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine AdvertisingPeinan Zhang, Yusuke Sakai 0010, Masato Mita, Hiroki Ouchi, Taro Watanabe. 7672-7691 [doi]
- Empowering Retrieval-based Conversational Recommendation with Contrasting User PreferencesHeejin Kook, Junyoung Kim 0001, Seongmin Park, Jongwuk Lee. 7692-7707 [doi]
- LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling MatricesJung-Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee. 7708-7743 [doi]
- Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent ReasoningGaurav Arora, Srujana Merugu, Shreya Jain, Vaibhav Saxena. 7744-7762 [doi]
- LLMs as Meta-Reviewers' Assistants: A Case StudyEftekhar Hossain, Sanjeev Kumar Sinha, Naman Bansal, R. Alexander Knipper, Souvika Sarkar, John Salvador, Yash Mahajan, Sri Guttikonda, Mousumi Akter 0001, Md. Mahadi Hassan, Matthew Freestone, Matthew C. Williams Jr., Dongji Feng, Santu Karmaker. 7763-7803 [doi]
- A Survey of NLP Progress in Sino-Tibetan Low-Resource LanguagesShuheng Liu 0002, Michael Best. 7804-7825 [doi]
- Enhancing Language Model Hypernetworks with Restart: A Study on OptimizationYihan Zhang, Jie Fu 0001, Rongrong Ji, Jie Chen 0006. 7826-7838 [doi]
- Functional Lexicon in Subword TokenizationZachary William Hopton, Yves Scherrer, Tanja Samardzic. 7839-7853 [doi]
- Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra DataHaonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu 0004, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng 0001, Kenji Kawaguchi. 7854-7873 [doi]
- Evaluating the Prompt Steerability of Large Language ModelsErik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Kush R. Varshney, Eitan Farchi, Pierre Dognin, Jesus Rios, Djallel Bouneffouf 0001, Miao Liu 0001, Prasanna Sattigeri. 7874-7900 [doi]
- A Data-Driven Method for Analyzing and Quantifying Lyrics-Dance Motion RelationshipsKento Watanabe, Masataka Goto. 7901-7916 [doi]
- CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific ConceptsMalvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis Konstas, Alessandro Suglia. 7917-7936 [doi]
- PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image PersonaJihyun Lee, Yejin Jeon, Seungyeon Seo, Gary Lee 0001. 7937-7958 [doi]
- Scaling LLM Inference Efficiently with Optimized Sample Compute AllocationKexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li 0005. 7959-7973 [doi]
- Large Language Models for Persian-English Idiom TranslationSara Rezaeimanesh, Faezeh Hosseini, Yadollah Yaghoobzadeh. 7974-7985 [doi]
- Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization AbilitiesKourosh T. Baghaei, Dieter Pfoser, Antonios Anastasopoulos. 7986-8005 [doi]
- Sneaking Syntax into Transformer Language Models with Tree RegularizationAnanjan Nandi, Christopher D. Manning, Shikhar Murty. 8006-8024 [doi]
- Meta-Cultural Competence: Climbing the Right Hill of Cultural AwarenessSougata Saha, Saurabh Kumar Pandey, Monojit Choudhury. 8025-8042 [doi]
- Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?Sougata Saha, Saurabh Kumar Pandey, Harshit Gupta, Monojit Choudhury. 8043-8067 [doi]
- HMT: Hierarchical Memory Transformer for Efficient Long Context Language ProcessingZifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong. 8068-8089 [doi]
- Faux Polyglot: A Study on Information Disparity in Multilingual Large Language ModelsNikhil Sharma, Kenton Murray, Ziang Xiao. 8090-8107 [doi]
- Teaching Models to Balance Resisting and Accepting PersuasionElias Stengel-Eskin, Peter Hase, Mohit Bansal. 8108-8122 [doi]
- Making Language Models Robust Against NegationMohammadHossein Rezaei, Eduardo Blanco 0002. 8123-8142 [doi]
- Through the Lens of History: Methods for Analyzing Temporal Variation in Content and Framing of State-run Chinese NewspapersShijia Liu, David A. Smith. 8143-8172 [doi]
- PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language ModelsMichael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An 0001, Sicheng Zhu, Aakriti Agrawal, Furong Huang. 8173-8190 [doi]
- Towards Operationalizing Right to Data ProtectionAbhinav Java, Simra Shahid, Chirag Agarwal. 8191-8205 [doi]
- Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language ModelsAliakbar Nafar, Kristen Brent Venable, Parisa KordJamshidi. 8206-8229 [doi]
- GLiREL - Generalist Model for Zero-Shot Relation ExtractionJack Boylan, Chris Hokamp, Demian Gholipour Ghalandari. 8230-8245 [doi]
- ComPO: Community Preferences for Language Model PersonalizationSachin Kumar 0009, Chan Young Park, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi. 8246-8279 [doi]
- GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language ModelsHarsh Kohli, Sachin Kumar, Huan Sun. 8280-8295 [doi]
- ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMsAly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim 0002, Yulia Tsvetkov, Yejin Choi 0001, Sherif Saad, Santu Rana. 8296-8321 [doi]
- Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical AnalysisPamela D. Rivière, Anne L. Beatty-Martínez, Sean Trott. 8322-8338 [doi]
- Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC TaskJunjie Wu 0007, Mo Yu, Lemao Liu, Dit-Yan Yeung, Jie Zhou 0016. 8339-8360 [doi]
- FedSpaLLM: Federated Pruning of Large Language ModelsGuangji Bai, Yijiang Li, Zilinghan Li, Liang Zhao 0002, Kibaek Kim. 8361-8373 [doi]
- IHEval: Evaluating Language Models on Following the Instruction HierarchyZhihan Zhang 0001, Shiyang Li, Zixuan Zhang, Xin Liu 0039, Haoming Jiang, Xianfeng Tang, Yifan Gao 0001, Zheng Li 0018, Haodong Wang, Zhaoxuan Tan, Yichuan Li 0001, Qingyu Yin, Bing Yin, Meng Jiang 0001. 8374-8398 [doi]
- Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and BeyondMardhiyah Sanni, Tassallah Abdullahi, Devendra Deepak Kayande, Emmanuel Ayodele, Naome A. Etori, Michael S. Mollel, Moshood Yekini, Chibuzor Okocha, Lukman E. Ismaila, Folafunmi Omofoye, Boluwatife Adeleye Adewale, Tobi Olatunji. 8399-8417 [doi]
- THREAD: Thinking Deeper with Recursive SpawningPhilip Schroeder, Nathaniel Morgan, Hongyin Luo, James R. Glass. 8418-8442 [doi]
- CORG: Generating Answers from Complex, Interrelated ContextsHyunji Lee, Franck Dernoncourt, Trung Bui, Seunghyun Yoon 0002. 8443-8460 [doi]
- Generating Diverse Hypotheses for Inductive ReasoningKang Il Lee, Hyukhun Koh, Dongryeol Lee, Seunghyun Yoon, MinSung Kim, Kyomin Jung. 8461-8474 [doi]
- On the Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language ModelsTianyang Zhao, Kunwar Yashraj Singh, Srikar Appalaraju, Peng Tang, Ying Nian Wu, Li Erran Li. 8475-8507 [doi]
- Open-World Evaluation for Retrieving Diverse PerspectivesHung-Ting Chen, Eunsol Choi. 8508-8528 [doi]
- Analyzing the Inner Workings of Transformers in Compositional GeneralizationRyoma Kumon, Hitomi Yanaka. 8529-8540 [doi]
- Substance Beats Style: Why Beginning Students Fail to Code with LLMsFrancesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q. Feldman, Carolyn Jane Anderson. 8541-8610 [doi]
- Reverse Thinking Makes LLMs Stronger ReasonersJustin Chih-Yao Chen, Zifeng Wang 0002, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long T. Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister. 8611-8630 [doi]
- Towards Lifelong Dialogue Agents via Timeline-based Memory ManagementKai Tzu-iunn Ong, Namyoung Kim, Minju Gwak, Hyungjoo Chae, Taeyoon Kwon, Yohan Jo, Seung-won Hwang, Dongha Lee 0003, Jinyoung Yeo. 8631-8661 [doi]
- StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel ExamplesAjay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch. 8662-8685 [doi]
- FiNE: Filtering and Improving Noisy Data Elaborately with Large Language ModelsJunliang He, Ziyue Fan, Shaohui Kuang, Li Xiaoqing, Kai Song, Yaqian Zhou 0001, Xipeng Qiu. 8686-8707 [doi]
- CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following AnalysisZiyue Fan, Junliang He, Li Xiaoqing, Shaohui Kuang, Kai Song, Yaqian Zhou 0001, Xipeng Qiu. 8708-8733 [doi]
- LongLeader: A Comprehensive Leaderboard for Large Language Models in Long-context ScenariosPei Chen, Hongye Jin, Cheng-Che Lee, Rulin Shao, Jingfeng Yang 0001, Mingyu Zhao, Zhaoyu Zhang, Qin Lu, Kaiwen Men, Ning Xie, Huasheng Li, Bing Yin, Han Li, Lingyun Wang. 8734-8750 [doi]
- Language Models Can Infer Action Semantics for Symbolic Planners from Environment FeedbackWang Bill Zhu, Ishika Singh, Robin Jia, Jesse Thomason. 8751-8773 [doi]
- SLM-Mod: Small Language Models Surpass LLMs at Content ModerationXianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha. 8774-8790 [doi]
- On Positional Bias of Faithfulness for Long-form SummarizationDavid Wan, Jesse Vig, Mohit Bansal, Shafiq Joty. 8791-8810 [doi]
- BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in AlignmentSizhe Wang, Yongqi Tong, Hengyuan Zhang, Dawei Li, Xin Zhang, Tianlong Chen. 8811-8826 [doi]
- UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language ModelsYijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramón Huerta, Ivan Vulic. 8827-8840 [doi]
- H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on TablesNikhil Abhyankar, Vivek Gupta 0001, Dan Roth, Chandan K. Reddy. 8841-8863 [doi]
- Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbationsYinghan Zhou, Juan Wen, Wanli Peng, Yiming Xue, Ziwei Zhang 0002, Zhengxian Wu. 8864-8875 [doi]
- Vision-Language Models Can Self-Improve Reasoning via ReflectionKanzhi Cheng, Yantao Li 0003, Fangzhi Xu, Jianbing Zhang, Hao Zhou, Yang Liu. 8876-8892 [doi]
- Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During TrainingDeven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini, Zoran Tiganj. 8893-8911 [doi]
- Knowledge Graph-Guided Retrieval Augmented GenerationXiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu. 8912-8924 [doi]
- Amphista: Bi-directional Multi-head Decoding for Accelerating LLM InferenceZeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li 0025, Jinzhang Peng, Lu Tian, Emad Barsoum. 8925-8938 [doi]
- CAVE: Controllable Authorship Verification ExplanationsSahana Ramnath, Kartik Pandey, Elizabeth Boschee, Xiang Ren 0001. 8939-8961 [doi]
- Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based EvaluationDongryeol Lee, Yerin Hwang, Yongil Kim, Joonsuk Park, Kyomin Jung. 8962-8984 [doi]
- Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMsShuyang Yu, Runxue Bao, Parminder Bhatia, Taha A. Kass-Hout, Jiayu Zhou, Cao Xiao. 8985-8997 [doi]
- Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model TrainingSun Ao, Weilin Zhao, Xu Han 0007, Cheng Yang 0002, Xinrong Zhang, Zhiyuan Liu 0001, Chuan Shi 0001, Maosong Sun 0001. 8998-9008 [doi]
- Differentially Private Learning Needs Better Model Initialization and Self-DistillationIvoline C. Ngong, Joseph P. Near, Niloofar Mireshghallah. 9009-9027 [doi]
- Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property TypeSeokwon Song, Taehyun Lee, Jaewoo Ahn, Jae Hyuk Sung, Gunhee Kim. 9028-9048 [doi]
- CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and SmellsAtharva Naik, Marcus Alenius, Daniel Fried, Carolyn P. Rosé. 9049-9076 [doi]
- KS-Lottery: Finding Certified Lottery Tickets for Multilingual Transfer in Large Language ModelsFei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei Li 0005. 9077-9090 [doi]
- PA-RAG: RAG Alignment via Multi-Perspective Preference OptimizationJiayi Wu 0001, Hengyi Cai, Lingyong Yan, Hao Sun 0015, Xiang Li 0067, Shuaiqiang Wang, Dawei Yin, Ming Gao 0001. 9091-9112 [doi]
- B⁴: A Black-Box Scrubbing Attack on LLM WatermarksBaizhou Huang, Xiao Pu 0003, Xiaojun Wan 0001. 9113-9126 [doi]
- IMRRF: Integrating Multi-Source Retrieval and Redundancy Filtering for LLM-based Fake News DetectionDayang Li, Fanxiao Li, Bingbing Song, Li Tang, Wei Zhou 0011. 9127-9142 [doi]
- Matina: A Large-Scale 73B Token Persian Text CorpusSara Bourbour Hosseinbeigi, Fatemeh Taherinezhad, Heshaam Faili, Hamed Baghbani, Fatemeh Nadi, Mostafa Amiri. 9143-9157 [doi]
- SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text GenerationSaurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury. 9158-9176 [doi]
- ManaTTS Persian: a recipe for creating TTS datasets for lower resource languagesMahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee. 9177-9206 [doi]
- CultureInstruct: Curating Multi-Cultural Instructions at ScaleViet-Thanh Pham, Zhuang Li, Lizhen Qu, Gholamreza Haffari. 9207-9228 [doi]
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language ModelsLovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, Dieuwke Hupkes. 9229-9242 [doi]
- DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language ModelsWei He 0001, Kai Han 0002, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo 0001, Yunhe Wang 0001. 9243-9254 [doi]
- A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate ModelShengxiang Gao, Fang Nan, Yongbing Zhang 0004, Yuxin Huang, Kaiwen Tan, Zhengtao Yu 0001. 9255-9265 [doi]
- Measuring memorization in language models via probabilistic extractionJamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A. Choquette-Choo, Katherine Lee, A. Feder Cooper. 9266-9291 [doi]
- Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal ModelsHao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari. 9292-9306 [doi]
- EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language ModelsYunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han 0002, Yunhe Wang 0001. 9307-9320 [doi]
- Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model AlignmentYuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe. 9321-9347 [doi]
- MAPWise: Evaluating Vision-Language Models for Advanced Map QueriesSrija Mukhopadhyay, Abhishek Rajgaria, Prerana Khatiwada, Manish Shrivastava 0001, Dan Roth, Vivek Gupta 0001. 9348-9378 [doi]
- Pay More Attention to Images: Numerous Images-Oriented Multimodal SummarizationMin Xiao, Junnan Zhu, Feifei Zhai, Chengqing Zong, Yu Zhou 0001. 9379-9392 [doi]
- S²-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate EfficiencyYuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, Xitai Jin, Chen Tianying Tiana, Jing Li, XiaoHua Xu. 9393-9408 [doi]
- MASTER: A Multi-Agent System with LLM Specialized MCTSBingzheng Gan, Yufan Zhao, Tianyi Zhang, Jing Huang, Yusu Li, Shu Xian Teo, Changwang Zhang, Wei Shi. 9409-9426 [doi]
- ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App ScreenshotsYu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Srinivas Sunkara, Victor Carbune, Jason Lin, Maria Wang, Yun Zhu, Jindong Chen. 9427-9452 [doi]
- Cross-Lingual and Cross-Cultural Variation in Image DescriptionsUri Berger, Edoardo M. Ponti. 9453-9465 [doi]
- Soft Syntactic Reinforcement for Neural Event ExtractionAnran Hao, Jian Su, Shuo Sun, Teo Yong Sen. 9466-9478 [doi]
- Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language ModelsHyegang Son, Yonglak Son, Changhoon Kim, Young-geun Kim. 9479-9496 [doi]
- Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and EvaluationJaechang Kim 0001, Jinmin Goh, Inseok Hwang 0001, Jaewoong Cho, Jungseul Ok. 9497-9516 [doi]
- TCProF:Time-Complexity Prediction SSL FrameworkJoonghyuk Hahn, Hyeseon Ahn, Jungin Kim, Soohan Lim, Yo-Sub Han. 9517-9542 [doi]
- Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt RefinementSuchae Jeong, Inseong Choi, Youngsik Yun, Jihie Kim. 9543-9573 [doi]
- Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language ModelsSehun Lee, Kang-wook Kim 0004, Gunhee Kim. 9574-9593 [doi]
- Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language ModelsChaoqun Liu, Wenxuan Zhang 0001, Yiran Zhao 0006, Anh Tuan Luu, Lidong Bing. 9594-9614 [doi]
- AlgoPuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Algorithmic Multimodal PuzzlesDeepanway Ghosal, Vernon Toh, Yew Ken Chia, Soujanya Poria. 9615-9632 [doi]
- Towards Quantifying Commonsense Reasoning with Mechanistic InsightsAbhinav Joshi, Areeb Ahmad, Divyaksh Shukla, Ashutosh Modi. 9633-9660 [doi]
- Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMsAnirudh Phukan, Divyansh, Harshit Kumar Morj, Vaishnavi, Apoorv Saxena, Koustava Goswami. 9661-9675 [doi]
- M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language ModelsRishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan. 9676-9713 [doi]
- Multi³Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language ModelsMinh Duc Bui, Katharina von der Wense, Anne Lauscher. 9714-9731 [doi]
- Grounding Fallacies Misrepresenting Scientific Publications in EvidenceMax Glockner, Yufang Hou 0001, Preslav Nakov, Iryna Gurevych. 9732-9767 [doi]
- Has this Fact been Edited? Detecting Knowledge Edits in Language ModelsPaul Youssef, Zhixue Zhao, Christin Seifert, Jörg Schlötterer. 9768-9784 [doi]
- AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter MergingYiran Zhao 0006, Wenxuan Zhang 0001, Huiming Wang, Kenji Kawaguchi, Lidong Bing. 9785-9800 [doi]
- Coverage-based Fairness in Multi-document SummarizationHaoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi. 9801-9819 [doi]
- Grammar Control in Dialogue Response Generation for Language Learning ChatbotsDominik Glandorf, Peng Cui 0006, Detmar Meurers, Mrinmaya Sachan. 9820-9839 [doi]
- Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural KnowledgeLi Zhou 0010, Taelin Karidi, Wanlong Liu, Nicolas Garneau, Yong Cao, Wenyu Chen 0001, Haizhou Li 0001, Daniel Hershcovich. 9840-9867 [doi]
- Palette of Language Models: A Solver for Controlled Text GenerationZhe Yang, Yi Huang, Yaqin Chen, XiaotingWu XiaotingWu, Junlan Feng, Chao Deng. 9868-9881 [doi]
- MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent CollaborationDavid Wan, Justin Chih-Yao Chen, Elias Stengel-Eskin, Mohit Bansal. 9882-9901 [doi]
- MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue GenerationJunqing He, Liang Zhu, Rui Wang, Xi Wang, Gholamreza Haffari, Jiaxing Zhang. 9902-9921 [doi]
- Assessing the State of the Art in Scene SegmentationAlbin Zehe, Elisabeth Fischer, Andreas Hotho. 9922-9941 [doi]
- DCE-LLM: Dead Code Elimination with Large Language ModelsMinyu Chen, Guoqiang Li 0001, Ling-I Wu, Ruibang Liu. 9942-9955 [doi]
- Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta InstructionLiping Liu, Chunhong Zhang, Likang Wu, Chuang Zhao 0002, Zheng Hu 0001, Ming He, Jianping Fan 0007. 9956-9978 [doi]
- Correcting Negative Bias in Large Language Models through Negative Attention Score AlignmentSangwon Yu, Jongyoon Song, Bongkyu Hwang, Hoyoung Kang, Sooah Cho, Junhwa Choi, Seongho Joe, Taehee Lee, Youngjune Gwon, Sungroh Yoon. 9979-10001 [doi]
- MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning StepsXiongtao Zhou, Jie He 0004, Lanyu Chen, Jingyu Li, Haojing Chen, Víctor Gutiérrez-Basulto, Jeff Z. Pan, Hanjie Chen. 10002-10039 [doi]
- CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-ExpertsZhenpeng Su, Xing Wu 0002, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan Ma, Hui Chen 0013, Songlin Hu 0001, Guiguang Ding. 10040-10055 [doi]
- Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive LanguageAmalie Brogaard Pauli, Isabelle Augenstein, Ira Assent. 10056-10075 [doi]
- MILU: A Multi-task Indic Language Understanding BenchmarkSshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen. 10076-10132 [doi]
- AutoEval-ToD: Automated Evaluation of Task-oriented Dialog SystemsArihant Jain, Purav Aggarwal, Rishav Sahay, Chaosheng Dong, Anoop Saladi. 10133-10148 [doi]
- Self-calibration for Language Model Quantization and PruningMiles Williams, George Chrysostomou, Nikolaos Aletras. 10149-10167 [doi]
- Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language ModelsTongxuan Liu, Wenjiang Xu, Weizhe Huang, Yuting Zeng, Jiaxing Wang, Xingyu Wang, Hailong Yang, Jing Li. 10168-10185 [doi]
- IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information RetrievalTingyu Song, Guo Gan, Mingsheng Shang 0001, Yilun Zhao 0001. 10186-10204 [doi]
- QAVA: Query-Agnostic Visual Attack to Large Vision-Language ModelsYudong Zhang 0008, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang 0002. 10205-10218 [doi]
- Evaluating and Improving Graph to Text Generation with Large Language ModelsJie He 0004, Yijun Yang, Wanqiu Long, Deyi Xiong, Víctor Gutiérrez-Basulto, Jeff Z. Pan. 10219-10244 [doi]
- The Plagiarism Singularity ConjectureSriram Ranga, Rui Mao 0010, Erik Cambria, Anupam Chattopadhyay. 10245-10255 [doi]
- Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex ReasoningSungjin Park, Xiao Liu, Yeyun Gong, Edward Choi. 10256-10277 [doi]
- One Unified Model for Diverse Tasks: Emotion Cause Analysis via Self-Promote Cognitive Structure ModelingZhaoxin Yu, Xinglin Xiao, Wenji Mao. 10278-10293 [doi]
- Soft Language Prompts for Language TransferIvan Vykopal, Simon Ostermann 0002, Marián Simko. 10294-10313 [doi]
- PICLe: Pseudo-annotations for In-Context Learning in Low-Resource Named Entity DetectionSepideh Mamooler, Syrielle Montariol, Alexander Mathis, Antoine Bosselut. 10314-10331 [doi]
- Can Large Language Models Invent Algorithms to Improve Themselves?Yoichi Ishibashi, Taro Yano, Masafumi Oyamada. 10332-10363 [doi]
- Simulating Classroom Education with LLM-Empowered AgentsZheyuan Zhang, Daniel Zhang-li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu 0001, Lei Hou 0001, Juanzi Li. 10364-10379 [doi]
- A Grounded Typology of Word ClassesColeman Haley, Sharon Goldwater, Edoardo M. Ponti. 10380-10399 [doi]
- SSH: Sparse Spectrum Adaptation via Discrete Hartley TransformationYixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania. 10400-10415 [doi]
- LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in DialogueSangyeop Kim 0001, Sohhyung Park, Jaewon Jung 0002, Jinseok Kim, Sungzoon Cho. 10416-10430 [doi]
- LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsSumin An, Junyoung Sung, Wonpyo Park, Chanjun Park, Paul Hongsuck Seo. 10431-10442 [doi]
- A Template Is All You MemeLuke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych. 10443-10475 [doi]
- LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?Ján Cegin, Jakub Simko, Peter Brusilovsky. 10476-10496 [doi]
- Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted CaptionsMoran Yanuka, Assaf Ben-Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes. 10497-10518 [doi]
- Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale EvaluationJaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak. 10519-10539 [doi]
- Evaluating Defeasible Reasoning in LLMs with DEFREASINGEmily Allaway, Kathleen McKeown. 10540-10558 [doi]
- Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation FrameworkJingyi Sun, Pepa Atanasova, Isabelle Augenstein. 10559-10577 [doi]
- From Evidence to Belief: A Bayesian Epistemology Approach to Language ModelsMinsu Kim, SangRyul Kim, James Thorne. 10578-10611 [doi]
- Private Synthetic Text Generation with Diffusion ModelsSebastian Ochs 0001, Ivan Habernal. 10612-10626 [doi]
- Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided SamplingYiwen Ding, Zhiheng Xi, Wei He 0024, Lizhuoyuan Lizhuoyuan, Yitao Zhai, Shi Xiaowei, Xunliang Cai, Tao Gui, Qi Zhang 0001, Xuanjing Huang 0001. 10627-10646 [doi]
- FactEval: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language ModelsMamta Mamta, Oana Cocarascu. 10647-10660 [doi]
- Analyzing Memorization in Large Language Models through the Lens of Model AttributionTarun Ram Menta, Susmit Agrawal, Chirag Agarwal. 10661-10689 [doi]
- Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQLBingfeng Chen, Shaobin Shi, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao. 10690-10708 [doi]
- Prototypical Extreme Multi-label Classification with a Dynamic Margin LossKunal Dahiya, Diego Ortego, David Jimenez-Cabello. 10709-10727 [doi]
- MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackZonghai Yao, Aditya Parashar, Huixue Zhou, Won-Seok Jang, Feiyun Ouyang, Zhichao Yang 0001, Hong Yu 0001. 10728-10777 [doi]
- Main Predicate and Their Arguments as Explanation Signals For Intent ClassificationSameer Pimparkhede, Pushpak Bhattacharyya. 10778-10789 [doi]
- Handling Missing Entities in Zero-Shot Named Entity Recognition: Integrated Recall and Retrieval AugmentationRuichu Cai, Junhao Lu, Zhongjie Chen, Boyan Xu, Zhifeng Hao. 10790-10802 [doi]
- KMI: A Dataset of Korean Motivational Interviewing Dialogues for PsychotherapyHyunjong Kim, Suyeon Lee, Yeongjae Cho, Eunseo Ryu, Yohan Jo, Suran Seong, Sungzoon Cho. 10803-10828 [doi]
- Automatic Input Rewriting Improves Translation with Large Language ModelsDayeon Ki, Marine Carpuat. 10829-10856 [doi]
- HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity TheoremVladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh. 10857-10886 [doi]
- The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant UnitsBadr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf. 10887-10911 [doi]
- MixLLM: Dynamic Routing in Mixed Large Language ModelsXinyuan Wang 0011, Yanchi Liu, Wei Cheng 0002, Xujiang Zhao, Zhengzhang Chen, Wenchao Yu, Yanjie Fu, Haifeng Chen. 10912-10922 [doi]
- Continual Learning in Multilingual Sign Language TranslationShakib Yazdani, Josef van Genabith, Cristina España-Bonet. 10923-10938 [doi]
- Few-Shot Natural Language to First-Order Logic Translation via Code GenerationJunnan Liu. 10939-10960 [doi]
- How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMsRan Zhang, Wei Zhao, Steffen Eger. 10961-10988 [doi]
- PORT: Preference Optimization on Reasoning TracesSalem Lahlou, Abdalgader Abubaker, Hakim Hacid. 10989-11005 [doi]
- Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?Xuan He, Da Yin, Nanyun Peng 0001. 11006-11046 [doi]
- Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt TuningFaeze Ghorbanpour, Viktor Hangya, Alexander Fraser 0001. 11047-11061 [doi]
- A Systematic Examination of Preference Learning through the Lens of Instruction-FollowingJoongwon Kim, Anirudh Goyal, Aston Zhang, Bo Xiong, Rui Hou, Melanie Kambadur, Dhruv Mahajan 0001, Hannaneh Hajishirzi, Liang Tan. 11062-11082 [doi]
- Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication UseMohit Chandra, Siddharth Sriraman, Gaurav Verma, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, Munmun De Choudhury. 11083-11113 [doi]
- Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task SupervisionZhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian J. McAuley, Peter Clark, Bodhisattwa Prasad Majumder. 11114-11134 [doi]
- LLM-Supported Natural Language to Bash TranslationFinnian Westenfelder, Erik Hemberg, Stephen Moskal, Una-May O'Reilly, Silviu Chiricescu. 11135-11147 [doi]
- REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM RelianceKaitlyn Zhou, Jena D. Hwang, Xiang Ren 0001, Nouha Dziri, Dan Jurafsky, Maarten Sap. 11148-11167 [doi]
- Eliciting Critical Reasoning in Retrieval-Augmented Generation via Contrastive ExplanationsLeonardo Ranaldi, Marco Valentino, André Freitas. 11168-11183 [doi]
- A Distributional Perspective on Word Learning in Neural Language ModelsFilippo Ficarra, Ryan Cotterell, Alex Warstadt. 11184-11207 [doi]
- Disentangling language change: sparse autoencoders quantify the semantic evolution of indigeneity in FrenchJacob Matthews, Laurent Dubreuil, Imane Terhmina, Yunci Sun, Matthew Wilkens, Marten Van Schijndel. 11208-11222 [doi]
- Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning LanguagesMax Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman 0002, Stephen H. Bach. 11223-11240 [doi]
- One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversitySonia K. Murthy, Tomer D. Ullman, Jennifer Hu 0001. 11241-11258 [doi]
- Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review RatingsLinsen Li, Aron Culotta, Nicholas Mattei. 11259-11277 [doi]
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rateXiaomeng Jin, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong 0001. 11278-11294 [doi]
- REFFLY: Melody-Constrained Lyrics Editing ModelSongyan Zhao, Bingxuan Li, Yufei Tian, Nanyun Peng 0001. 11295-11315 [doi]
- Exploring Safety-Utility Trade-Offs in Personalized Language ModelsAnvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi. 11316-11340 [doi]
- MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart ProblemsZifeng Zhu, Mengzhao Jia, Zhihan Zhang 0001, Lang Li, Meng Jiang 0001. 11341-11359 [doi]
- It Is Not Only the Negative that Deserves Attention! Understanding, Generation & Evaluation of (Positive) ModerationIman Jundi, Eva Maria Vecchi, Carlotta Quensel, Neele Falk, Gabriella Lapesa. 11360-11395 [doi]
- Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and PrejudiceSunny Rai, Khushang Jilesh Zaveri, Shreya Havaldar, Soumna Nema, Lyle H. Ungar, Sharath Chandra Guntuku. 11396-11415 [doi]
- The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept UnderstandingMo Yu, Lemao Liu, Junjie Wu 0007, Tsz Ting Chung, Shunchi Zhang, Jiangnan Li, Dit-Yan Yeung, Jie Zhou 0016. 11416-11431 [doi]
- mHumanEval - A Multilingual Benchmark to Evaluate Large Language Models for Code GenerationMd. Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri. 11432-11461 [doi]
- What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and EvaluationMichal Golovanevsky, William Rudman, Vedant Palit, Carsten Eickhoff, Ritambhara Singh. 11462-11482 [doi]
- Are explicit belief representations necessary? A comparison between Large Language Models and Bayesian probabilistic modelsDingyi Pan, Benjamin K. Bergen 0001. 11483-11498 [doi]
- Self-Generated Critiques Boost Reward Modeling for Language ModelsYue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu 0001, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan 0001, Rui Hou. 11499-11514 [doi]
- Characterizing the Role of Similarity in the Property Inferences of Language ModelsJuan Diego Rodriguez, Aaron Mueller, Kanishka Misra. 11515-11533 [doi]
- SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized DomainsRan Xu 0002, Hui Liu 0031, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo 0003, Yang Li 0055, Joyce C. Ho, Carl Yang 0001, Qi He 0002. 11534-11550 [doi]
- Learning to Substitute Words with Model-based Score RankingHongye Liu, Ricardo Henao. 11551-11565 [doi]
- Multilingual Reasoning via Self-trainingLeonardo Ranaldi, Giulia Pucci. 11566-11582 [doi]
- xLAM: A Family of Large Action Models to Empower AI Agent SystemsJianguo Zhang, Tian Lan 0006, Ming Zhu, Zuxin Liu, Thai-Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Manoj Awalgaonkar, Rithesh R. N., Zeyuan Chen 0001, Ran Xu 0001, Juan Carlos Niebles, Shelby Heinecke, Huan Wang 0014, Silvio Savarese, Caiming Xiong. 11583-11597 [doi]
- ProMQA: Question Answering Dataset for Multimodal Procedural Activity UnderstandingKimihiro Hasegawa, Wiradee Imrattanatrai, Zhi-Qi Cheng, Masaki Asada, Susan Holm, Yuran Wang, Ken Fukuda, Teruko Mitamura. 11598-11617 [doi]
- Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics StatementsAntonia Karamolegkou, Sandrine Schiller Hansen, Ariadni Christopoulou, Filippos Stamatiou, Anne Lauscher, Anders Søgaard. 11618-11635 [doi]
- AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric KnowledgeHan Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal. 11636-11652 [doi]
- Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math ReasoningYilun Zhao 0001, Guo Gan, Chen Zhao, Arman Cohan. 11653-11665 [doi]
- LBC: Language-Based-Classifier for Out-Of-Variable GeneralizationKangjun Noh, Baekryun Seong, Hoyoon Byun, YoungJun Choi, SungJin Song, Kyungwoo Song. 11666-11678 [doi]
- On the Impact of Fine-Tuning on Chain-of-Thought ReasoningElita A. Lobo, Chirag Agarwal, Himabindu Lakkaraju. 11679-11698 [doi]
- InfoPO: On Mutual Information Maximization for Large Language Model AlignmentTeng Xiao, Zhen Ge, Sujay Sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, Qingjun Cui, Trishul Chilimbi. 11699-11711 [doi]
- Is In-Context Learning a Type of Error-Driven Learning? Evidence from the Inverse Frequency Effect in Structural PrimingZhenghao Zhou, Robert Frank 0001, R. Thomas McCoy. 11712-11725 [doi]
- Guiding Medical Vision-Language Models with Diverse Visual Prompts: Framework Design and Comprehensive Exploration of Prompt VariationsKangyu Zhu, Ziyuan Qin 0001, Huahui Yi, Zekun Jiang, Qicheng Lao, Shaoting Zhang 0001, Kang Li 0004. 11726-11739 [doi]
- Analyzing and Improving Coherence of Large Language Models in Question AnsweringIvano Lauriola, Stefano Campese, Alessandro Moschitti. 11740-11755 [doi]
- ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data ValuationYanzhou Pan, Huawei Lin 0001, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao 0001, Denghui Zhang, Zhaozhuo Xu. 11756-11771 [doi]
- E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic ExpressionsHongbo Zheng, Suyuan Wang, Neeraj Gangwar, Nickvash Kani. 11772-11788 [doi]
- Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-SpeechEric Battenberg, R. J. Skerry-Ryan, Daisy Stanton, Soroosh Mariooryad, Matt Shannon, Julian Salazar, David Kao. 11789-11806 [doi]
- PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation MetricsDaniil Larionov, Steffen Eger. 11807-11820 [doi]
- AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMsQuazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D. Phan, Le Chen, Mihai Capota, Theodore L. Willke, Nesreen K. Ahmed, Ali Jannesari. 11821-11841 [doi]
- Causally Modeling the Linguistic and Social Factors that Predict Email ResponseYinuo Xu, Hong Chen, Sushrita Rakshit, Aparna Ananthasubramaniam, Omkar Yadav, Mingqian Zheng, Michael Jiang, LeChen Zhang, Bowen Yi, Kenan Alkiek, Abraham Israeli, Bangzhao Shu, Hua Shen, Jiaxin Pei, Haotian Zhang, Miriam Schirmer, David Jurgens. 11842-11866 [doi]
- AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM AgentsZhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, Maarten Sap. 11867-11894 [doi]
- Beyond the Safety Bundle: Auditing the Helpful and Harmless DatasetKhaoula Chehbouni, Jonathan Colaço Carr, Yash More, Jackie CK Cheung, Golnoosh Farnadi. 11895-11925 [doi]
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsOrion Weller, Benjamin Chang 0007, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn J. Lawrie, Luca Soldaini. 11926-11942 [doi]
- Few-shot Personalization of LLMs with Mis-aligned ResponsesJaehyung Kim 0001, Yiming Yang. 11943-11974 [doi]
- Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script LanguagesHoang Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary. 11975-11994 [doi]
- SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language ModelsMargaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle, Manan Dey, Sil Hamilton, Timm Dill, Jad Doughman, Ritam Dutt, Avijit Ghosh, Jessica Zosa Forde, Carolin Holtermann, Lucie-Aimée Kaffee, Tanmay Laud, Anne Lauscher, Roberto L. Lopez-Davila, Maraim Masoud, Nikita Nangia, Anaelia Ovalle, Giada Pistilli, Dragomir Radev, Beatrice Savoldi, Vipul Raheja, Jeremy Qin, Esther Ploeger, Arjun Subramonian, Kaustubh D. Dhole, Kaiser Sun, Amirbek Djanibekov, Jonibek Mansurov, Kayo Yin, Emilio Villa-Cueva, Sagnik Mukherjee, Jerry Huang, Xudong Shen, Jay Gala, Hamdan Al-Ali, Tair Djanibekov, Nurdaulet Mukhituly, Shangrui Nie, Shanya Sharma, Karolina Stanczak, Eliza Szczechla, Tiago Timponi Torrent, Deepak Tunuguntla, Marcelo Viridiano, Oskar van der Wal, Adina Yakefu, Aurélie Névéol, Mike Zhang, Sydney Zink, Zeerak Talat. 11995-12041 [doi]
- Speculative Diffusion Decoding: Accelerating Language Generation through DiffusionJacob K. Christopher, Brian R. Bartoldson, Tal Ben-Nun, Michael Cardei, Bhavya Kailkhura, Ferdinando Fioretto. 12042-12059 [doi]
- Bayelemabaga: Creating Resources for Bambara NLPAllahsera Auguste Tapo, Kevin Assogba, Christopher M. Homan, M. Mustafa Rafique, Marcos Zampieri. 12060-12070 [doi]
- Single Ground Truth Is Not Enough: Adding Flexibility to Aspect-Based Sentiment Analysis EvaluationSoyoung Yang, Hojun Cho, Jiyoung Lee, Sohee Yoon, Edward Choi, Jaegul Choo, Won-Ik Cho. 12071-12096 [doi]
- DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language ModelsJianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Pei Wang, Yanan Wu, Jihao Gu, Yangguang Li 0001, Jianke Zhu. 12097-12118 [doi]
- In-Context Learning with Long-Context Models: An In-Depth ExplorationAmanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon 0002, Jonathan Berant, Matthew R. Gormley, Graham Neubig. 12119-12149 [doi]
- Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training CorporaJoonho Lee, JuYoun Son, Juree Seok, Wooseok Jang, Yeong-Dae Kwon. 12150-12169 [doi]
- TurtleBench: A Visual Programming Benchmark in Turtle GeometrySina Rismanchian, Yasaman Razeghi, Sameer Singh 0001, Shayan Doroudi. 12170-12188 [doi]
- Automatically Discovering How Misogyny is Framed on Social MediaRakshitha Rao Ailneni, Sanda M. Harabagiu. 12189-12208 [doi]
- Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary EvaluationMahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu J. Han, Hang Su. 12209-12246 [doi]
- ReIFE: Re-evaluating Instruction-Following EvaluationYixin Liu 0003, Kejian Shi, Alexander R. Fabbri, Yilun Zhao 0001, PeiFeng Wang, Chien-Sheng Wu, Shafiq Joty, Arman Cohan. 12247-12287 [doi]
- Language Models Predict Empathy Gaps Between Social In-groups and Out-groupsYu Hou, Hal Daumé III, Rachel Rudinger. 12288-12304 [doi]
- HARP: Hesitation-Aware Reframing in Transformer Inference PassRomain Storaï, Seung-won Hwang. 12305-12319 [doi]
- JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM BenchmarkingSamar Mohamed Magdy, Sang Yun Kwon, Fakhraddin Alwajih, Safaa Taher Abdelfadil, Shady Shehata, Muhammad Abdul-Mageed. 12320-12341 [doi]
- EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMsSam Lin, Wenyue Hua, Zhenting Wang, Mingyu Jin, Lizhou Fan, Yongfeng Zhang. 12342-12361 [doi]
- MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with ToolsNishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su 0001, Sam Thomson. 12362-12375 [doi]
- PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio ClassificationAshish Seth, Ramaneswaran Selvakumar, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha. 12376-12394 [doi]
- Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective TasksJustin Zhao, Flor Miriam Plaza del Arco, Amanda Cercas Curry. 12395-12450 [doi]
- SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language ModelsCarter Teplica, Yixin Liu 0003, Arman Cohan, Tim G. J. Rudner. 12451-12469 [doi]
- ProSE: Diffusion Priors for Speech EnhancementSonal Kumar, Sreyan Ghosh, Utkarsh Tyagi, Anton Jeran Ratnarajah, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha. 12470-12483 [doi]
- Mastering the Craft of Data Synthesis for CodeLLMsMeng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson 0001, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li. 12484-12500 [doi]
- ParaICL: Towards Parallel In-Context LearningXingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing. 12501-12511 [doi]
- CausalEval: Towards Better Causal Reasoning in Language ModelsLongxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan. 12512-12540 [doi]
- Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack DefenseYang Ouyang, Hengrui Gu 0002, Shuhang Lin, Wenyue Hua, Jie Peng, Bhavya Kailkhura, Meijun Gao, Tianlong Chen, Kaixiong Zhou. 12541-12554 [doi]
- DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language ModelsSuyoung Bae, Yunseok Choi, Jee-Hyong Lee 0001. 12555-12574 [doi]
- Reward-Guided Tree Search for Inference Time Alignment of Large Language ModelsChia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria. 12575-12593 [doi]
- Typographic Attacks in a Multi-Image SettingXiaomeng Wang, Zhengyu Zhao 0001, Martha A. Larson. 12594-12604 [doi]
- Tonguescape: Exploring Language Models Understanding of Vowel ArticulationHaruki Sakajo, Yusuke Sakai 0010, Hidetaka Kamigaito, Taro Watanabe. 12605-12619 [doi]
- CoRAC: Integrating Selective API Document Retrieval with Question Semantic Intent for Code Question AnsweringYunseok Choi, CheolWon Na, Jee-Hyong Lee 0001. 12620-12635 [doi]
- Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on BasqueAnder Corral, Ixak Sarasua, Xabier Saralegi. 12636-12655 [doi]
- How to Make LLMs Forget: On Reversing In-Context Knowledge EditsPaul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert. 12656-12669 [doi]
- PerCul: A Story-Driven Cultural Evaluation of LLMs in PersianErfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh, Azadeh Shakery, Mohammad Taher Pilehvar. 12670-12687 [doi]
- Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language ModelsSoham Poddar, Paramita Koley, Janardan Misra, Niloy Ganguly, Saptarshi Ghosh 0001. 12688-12704 [doi]
- CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research RepositoriesYijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang. 12705-12723 [doi]
- SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented DataSuyoung Bae, Yunseok Choi, Hyojun Kim, Jee-Hyong Lee 0001. 12724-12738 [doi]
- Rationale-Guided Retrieval Augmented Generation for Medical Question AnsweringJiwoong Sohn, Yein Park, Chanwoong Yoon, Sihyeon Park, Hyeon Hwang, Mujeen Sung, Hyunjae Kim, Jaewoo Kang. 12739-12753 [doi]
- Prototype Conditioned Generative Replay for Continual Learning in NLPXi Chen, Min Zeng. 12754-12770 [doi]
- KODIS: A Multicultural Dispute Resolution Dialogue CorpusJames Hale, Sushrita Rakshit, Kushal Chawla, Jeanne M. Brett, Jonathan Gratch. 12771-12785 [doi]