Abstract is missing.
- Unveiling the Deficiencies of Pre-trained Text-and-Layout Models in Real-world Visually-rich Document Information ExtractionChong Zhang, Yixi Zhao, Yulu Xie, Chenshu Yuan, Yi Tu, Ya Guo, Mingxu Chai, Ziyu Shen, Yue Zhang, Qi Zhang 0001. 1-16 [doi]
- Entity-aware Cross-lingual Claim Detection for Automated Fact-checkingRrubaa Panchendrarajan, Arkaitz Zubiaga. 17-33 [doi]
- WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement LearningYuchen Zhuang, Di Jin, Jiaao Chen, Wenqi Shi, HanRui Wang, Chao Zhang. 34-49 [doi]
- Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMsSewon Kim, Jiwon Kim, Seungwoo Shin, Hyejin Chung, Daeun Moon, Yejin Kwon, Hyunsoo Yoon. 50-78 [doi]
- Joint Multimodal Preference Optimization for Fine-Grained Visual-Textual AlignmentJiwon Kim, Hyunsoo Yoon. 79-94 [doi]
- Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefilling Improves Theory of Mind in Large Language ModelsKazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kuniko Saito. 95-109 [doi]
- Plane Geometry Problem Solving with Multi-modal Reasoning: A SurveySeunghyuk Cho, Zhenyue Qin, Yang Liu 0249, Youngbin Choi, Seungbeom Lee, Dongwoo Kim. 110-131 [doi]
- Examining the Utility of Self-disclosure Types for Modeling Annotators of Social NormsKieran Henderson, Kian Omoomi, Vasudha Varadarajan, Allison Lahnala, Charles Welch. 132-150 [doi]
- Position Paper: How Should We Responsibly Adopt LLMs in the Peer Review Process?Juhwan Choi, Jungmin Yun, Changhun Kim, Youngbin Kim. 151-165 [doi]
- Rad-Flamingo: A Multimodal Prompt driven Radiology Report Generation Framework with Patient-Centric ExplanationsMD. Tousin Akhter, Devansh Lalwani, Kshitij Sharad Jadhav, Pushpak Bhattacharyya. 166-188 [doi]
- I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree SearchZujie Liang, Feng Wei, Wujiang Xu, Yuxi Qian, Lin Chen, Xinhui Wu. 189-210 [doi]
- ThinkNote: Enhancing Knowledge Integration and Utilization of Large Language Models via Constructivist Cognition ModelingZhipeng Xu, Zhenghao Liu 0001, Yukun Yan, Shuo Wang 0013, Shi Yu 0001, Zheni Zeng, Chaojun Xiao, Zhiyuan Liu 0001, Ge Yu 0001, Chenyan Xiong. 211-229 [doi]
- Mitigating Copy Bias in In-Context Learning through Neuron PruningAmeen Ali, Lior Wolf, Ivan Titov. 230-251 [doi]
- How to Make LMs Strong Node Classifiers?Zhe Xu 0007, Kaveh Hassani, Si Zhang, Hanqing Zeng, Michihiro Yasunaga, Limei Wang, Dongqi Fu, Ning Yao, Bo Long, Hanghang Tong. 252-274 [doi]
- Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New PerspectivesYajiao Liu, Congliang Chen, Junchi Yang, Ruoyu Sun 0001. 275-289 [doi]
- The Mediomatix Corpus: Parallel Data for Romansh Language Varieties via Comparable SchoolbooksZachary William Hopton, Jannis Vamvas, Andrin Büchler, Anna Rutkiewicz, Rico Cathomas, Rico Sennrich. 290-306 [doi]
- Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language ModelsWei Zhao, Zhe Li, Yige Li, Jun Sun 0001. 307-330 [doi]
- JEEM: Vision-Language Understanding in Four Arabic DialectsKarima Kadaoui, Hanin Atwany, Hamdan Al-Ali, Abdelrahman Mohamed, Ali Mekky, Sergei Tilga, Natalia Fedorova, Ekaterina Artemova, Hanan Aldarmaki, Yova Kementchedjhieva. 331-354 [doi]
- Detecting Primary Progressive Aphasia (PPA) from Text: A Benchmarking StudyGhofrane Merhbene, Fabian Lecron, Philippe Fortemps, Bradford C. Dickerson, Mascha Kurpicz-Briki, Neguine Rezaii. 355-374 [doi]
- Geometric Interpretation of Layer Normalization and a Comparative Analysis with RMSNormAkshat Gupta, Atahan Ozdemir, Caoqinwei Gong, Gopala Anumanchipalli. 375-407 [doi]
- Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMsHonghao Liu, Xuhui Jiang, Chengjin Xu, Cehao Yang, Yiran Cheng, Lionel Ni, Jian Guo 0016. 408-425 [doi]
- Do Diacritics Matter? Evaluating the Impact of Arabic Diacritics on Tokenization and LLM BenchmarksGo Inoue, Bashar Alhafni, Nizar Habash, Timothy Baldwin. 426-442 [doi]
- Pelican Soup Framework: A Theoretical Framework for Language Model CapabilitiesTing-Rui Chiang, Dani Yogatama. 443-464 [doi]
- I Know, but I Don't Know! How Persona Conflict Undermines Instruction Adherence in Large Language ModelsSeonmin Koo, Jinsung Kim, HeuiSeok Lim. 465-489 [doi]
- Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language ModelsMaximilian Kreutner, Marlene Lutz, Markus Strohmaier. 490-511 [doi]
- Exploring Iterative Controllable Summarization with Large Language ModelsSangwon Ryu, Heejin Do, Daehui Kim, Hwanjo Yu, Dongwoo Kim, Yunsu Kim 0001, Gary Lee 0001, Jungseul Ok. 512-528 [doi]
- The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language ModelsSherzod Hakimov, Roland Bernard, Tim Leiber, Karl Osswald, Kristina Richert, Ruilin Yang, Raffaella Bernardi, David Schlangen. 529-570 [doi]
- ART: Adaptive Reasoning Trees for Explainable Claim VerificationSahil Wadhwa, Himanshu Kumar, Guanqun Yang, Abbaas Alif Mohamed Nishar, Pranab Mohanty, Swapnil Shinde, Yue Wu. 571-586 [doi]
- VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User PrivacyYu Cui, Sicheng Pan, YiFei Liu, Haibin Zhang, Cong Zuo 0001. 587-609 [doi]
- VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of ThoughtEunsoo Lee, JeongWoo Lee, Minki Hong, Jangho Choi, Jihie Kim. 610-640 [doi]
- KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set OptimizationMingbo Song, Heming Xia, Jun Zhang 0069, Chak Tou Leong, Qiancheng Xu, Wenjie Li 0002, Sujian Li. 641-655 [doi]
- Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense ClusteringLouie Hong Yao, Nicholas Jarvis, Tianyu Jiang 0001. 656-672 [doi]
- HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech RecognitionGio Paik, Yongbeom Kim, Soungmin Lee, Sangmin Ahn, Chan Woo Kim. 673-681 [doi]
- Complexity-aware fine-tuningAndrey Goncharov 0002, Daniil Vyazhev, Petr Sychev, Edvard A. Khalafyan, Alexey Zaytsev 0002. 682-696 [doi]
- Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Question Answering TaskLeonardo Ranaldi. 697-716 [doi]
- SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-SolvingAshutosh Bajpai, Akshat Bhandari, Akshay Uttama Nambi, Tanmoy Chakraborty 0002. 717-742 [doi]
- Skill Discovery for Software Scripting Automation via Offline Simulations with LLMsPaiheng Xu, Gang Wu 0013, Xiang Chen 0010, Tong Yu 0001, Chang Xiao, Franck Dernoncourt, Tianyi Zhou 0001, Wei Ai 0002, Viswanathan (Vishy) Swaminathan. 743-759 [doi]
- How Important is 'Perfect' English for Machine Translation Prompts?Patrícia Schmidtová, Niyati Bafna, Seth Aycock, Gianluca Vico, Wiktor Kamzela, Kathy Hämmerl, Vilém Zouhar. 760-777 [doi]
- KETCHUP: K-Step Return Estimation for Sequential Knowledge DistillationJiabin Fan, Guoqing Luo, Michael Bowling, Lili Mou. 778-796 [doi]
- Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model SteeringHaiyan Zhao 0003, Xuansheng Wu, Fan Yang 0023, Bo Shen, Ninghao Liu 0001, Mengnan Du. 797-808 [doi]
- Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMsZara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa Anke. 809-820 [doi]
- MAPS: A Multilingual Benchmark for Agent Performance and SecurityOmer Hofman, Jonathan Brokman, Oren Rachmil, Shamik Bose, Vikas Pahuja, Toshiya Shimizu, Trisha Starostina, Kelly Marchisio, Seraphina Goldfarb-Tarrant, Roman Vainshtein. 821-845 [doi]
- Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question GenerationLiwen Sun, Xiang Yu, Ming Tan, Zhuohao Chen, Anqi Cheng, Ashutosh Joshi, Chenyan Xiong. 846-853 [doi]
- DebateQA: Evaluating Question Answering on Debatable KnowledgeRongwu Xu, Xuan Qi, Zehan Qi, Wei Xu 0039, Zhijiang Guo. 854-885 [doi]
- Personal Information Parroting in Language ModelsNishant Subramani, Kshitish Ghate, Mona T. Diab. 886-895 [doi]
- Harmful Factuality: LLMs Correcting What They Shouldn'tMingchen Li, Hanzhi Zhang, Heng Fan 0001, Junhua Ding, Yunhe Feng. 896-912 [doi]
- Toward Beginner-Friendly LLMs for Language Learning: Controlling Difficulty in ConversationMeiqing Jin, Liam Dugan, Chris Callison-Burch. 913-936 [doi]
- CodeGuard: Improving LLM Guardrails in CS EducationNishat Raihan, Noah Erdachew, Jayoti Devi, Joanna C. S. Santos, Marcos Zampieri. 937-949 [doi]
- ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMsYassir Lairgi, Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau. 950-966 [doi]
- On the Interplay between Human Label Variation and Model FairnessKemal Kurniawan, Meladel Mistica, Timothy Baldwin, Jey Han Lau. 967-976 [doi]
- Where do LLMs currently stand on biomedical NER in both clean and noisy settings ?Christophe Ye, Cassie S. Mitchell. 977-1001 [doi]
- Scaling Data-Constrained Language Models with Synthetic DataHirokazu Kiyomaru, Yusuke Oda, Takashi Kodama, Chaoran Liu, Daisuke Kawahara. 1002-1016 [doi]
- The Unintended Trade-off of AI Alignment: Balancing Hallucination Mitigation and Safety in LLMsOmar Mahmoud, Ali Khalil, Thommen George Karimpanal, Buddhika Laknath Semage, Santu Rana. 1017-1037 [doi]
- The Model's Language Matters: A Comparative Privacy Analysis of LLMsAbhishek K. Mishra, Antoine Boutet, Lucas Magnana. 1038-1048 [doi]
- Towards the First NLP Benchmark for Ladin - an Extremely Low-Resource LanguageUlin Nuha, Adam Jatowt. 1049-1064 [doi]
- DRAGON: Domain-specific Robust Automatic Data Generation for RAG OptimizationHaiyang Shen, Hang Yan 0012, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma 0002. 1065-1078 [doi]
- Causal Activation Steering via Sparse MediationToan Doan, Uyen Le, Thin Nguyen. 1079-1097 [doi]
- Causal Direct Preference Optimization for Language Model AlignmentUyen Le, Thin Nguyen, Toan Nguyen, Toan Doan, Trung Le, Bac Le. 1098-1113 [doi]
- LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step ArithmeticsKeito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Ana Brassard, Keisuke Sakaguchi, Kentaro Inui. 1114-1153 [doi]
- VaseVQA: Multimodal Agent and Benchmark for Ancient Greek PotteryJinchao Ge, Tengfei Cheng, Biao Wu 0006, Zeyu Zhang 0006, Shiya Huang, Judith Bishop, Gillian Shepherd, Meng Fang, Ling Chen 0006, Yang Zhao 0019. 1154-1167 [doi]
- PromptPrism: A Linguistically-Inspired Taxonomy for PromptsSullam Jeoung, Yueyan Chen, Yi Zhang, Shuai Wang, Haibo Ding, Lin Lee Cheong. 1168-1192 [doi]
- HiGraAgent: Dual-Agent Adaptive Reasoning over Hierarchical Knowledge Graph for Open Domain Multi-hop Question AnsweringHung Luu, Long S. T. Nguyen, Trung Pham, Hieu Pham, Tho Quan. 1193-1217 [doi]
- Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological LensMai Alkhamissi, Yunze Xiao, Badr AlKhamissi, Mona T. Diab. 1218-1235 [doi]
- Suppressing Final Layer Hidden State Jumps in Transformer PretrainingKeigo Shibata, Kazuki Yano, Ryosuke Takahashi, Jaesung Lee, Wataru Ikeda, Jun Suzuki 0001. 1236-1262 [doi]
- Intention-Adaptive LLM Fine-Tuning for Text Revision GenerationZhexiong Liu, Diane J. Litman. 1263-1281 [doi]
- ReAttn: Improving Attention-based Re-ranking via Attention Re-weightingYuxing Tian, Fengran Mo, Weixu Zhang, Yiyan Qi, Jian-Yun Nie. 1282-1295 [doi]
- MapAgent: A Hierarchical Agent for Geospatial Reasoning with Dynamic Map Tool IntegrationMd Hasebul Hasan, Mahir Labib Dihan, Tanzima Hashem, Mohammed Eunus Ali, Md. Rizwan Parvez. 1296-1322 [doi]
- Comprehensive Study of Bilingual and Multi-category Instruction Pre-trainingTakashi Kodama, Yusuke Oda. 1323-1340 [doi]
- Reflect, Rewrite, Repeat: How Simple Arithmetic Enables Advanced Reasoning in Small Language ModelsMengdie Flora Wang, Haochen Xie, Mun Young Kim, Baishali Chaudhury, Meghana Ashok, Suren Gunturu, Sungmin Hong, Jae Oh Woo. 1341-1363 [doi]
- Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code EvaluationJiwon Moon 0001, Yerin Hwang, Dongryeol Lee, Taegwan Kang, Yongil Kim, Kyomin Jung. 1364-1389 [doi]
- COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking ExplanationsRui Xing 0002, Preslav Nakov, Timothy Baldwin, Jey Han Lau. 1390-1411 [doi]
- Persona Jailbreaking in Large Language ModelsJivnesh Sandhan, Fei Cheng 0002, Tushar Sandhan, Yugo Murawaki. 1412-1430 [doi]
- ParsTranslit: Truly Versatile Tajik-Farsi TransliterationRayyan Merchant, Kevin Tang. 1431-1443 [doi]
- One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic RepresentationsKohei Oda, Po-Min Chuang, Kiyoaki Shirai, Natthawut Kertkeidkachorn. 1444-1452 [doi]
- ETOM: A Five-Level Benchmark for Evaluating Tool Orchestration within the MCP EcosystemJia-Kai Dong, I-Wei Huang, Chun-Tin Wu, Yi-Tien Tsai. 1453-1488 [doi]
- SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code GenerationSina Bagheri Nezhad, Yao Li, Ameeta Agrawal. 1489-1503 [doi]
- Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic CuesHeejeong Jeon, Minsu Park, Yunseok Choi, Eunil Park. 1504-1518 [doi]
- NLP Privacy Risk Identification in Social Media (NLP-PRISM): A SurveyDhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das 0001. 1519-1541 [doi]
- CrowdSelect: SyntheticInstruction Data Selection with Multi-LLM WisdomYisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou 0001, Yao Wan 0001, Weiwei Lin, Dongping Chen. 1542-1569 [doi]
- Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional VariationsSheng-Lun Wei, Yu-Ling Liao, Yen-Hua Chang, Hen-Hsen Huang, Hsin-Hsi Chen. 1570-1589 [doi]
- Pushing the Frontiers of Scientific Fact-Checking: The SCINLP DatasetIffat Maab, Junichi Yamagishi. 1590-1617 [doi]
- SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented GenerationNobuhiro Ueda, Yuyang Dong, Krisztián Boros, Daiki Ito, Takuya Sera, Masafumi Oyamada. 1618-1637 [doi]
- Unified Multimodal Interleaved Document Representation for RetrievalJaewoo Lee 0001, Joonho Ko, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang. 1638-1654 [doi]
- TELLME: Test-Enhanced Learning for Language Model EnrichmentMinJun Kim, Inho Won, HyeonSeok Lim, Minkyu Kim, Junghun Yuk, Wooyoung Go, Jongyoul Park, Jungyeul Park, Kyungtae Lim. 1655-1677 [doi]
- Beyond Accuracy: Alignment and Error Detection across Languages in the Bi-GSM8K Math-Teaching BenchmarkJieun Park, Kyungtae Lim, Joon-Ho Lim. 1678-1704 [doi]
- VN-MTEB: Vietnamese Massive Text Embedding BenchmarkLoc Pham, Tung Luu, Thu Vo, Minh Nguyen, Viet Hoang. 1705-1725 [doi]
- See More, Store Less: Memory-Efficient Resolution for Video Moment RetrievalMingyu Jeon, Sungjin Han, Jinkwon Hwang, Minchol Kwon, Jonghee Kim, Junyeong Kim. 1726-1736 [doi]
- RB-LoRA: Rank-Balanced Aggregation for Low-Rank Adaptation with Federated Fine-TuningSihyeon Ha, Yongjeong Oh, Yo-Seb Jeon. 1737-1746 [doi]
- Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About ItSeyed Mahed Mousavi, Edoardo Cecchinato, Lucia Hornikova, Giuseppe Riccardi. 1747-1759 [doi]
- Confidence-Driven Multi-Scale Model Selection for Cost-Efficient InferenceBo-Wei Chen, Chung-Chi Chen 0001, An-Zi Yen. 1760-1770 [doi]
- Quantifying the Impact of Structured Output Format on Large Language Models through Causal InferenceHan Yuan, Yue Zhao, Li Zhang, Wuqiong Luo, Zheng Ma. 1771-1795 [doi]
- Breaking the Illusion of Reasoning in Polish LLMs: Quality over Quantity of ThoughtDzmitry Pihulski, Mikolaj Langner, Jan Eliasz, Przemyslaw Kazienko, Jan Kocon, Teddy Ferdinan. 1796-1811 [doi]
- RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function LibraryJiapeng Wang, Jinhao Jiang, Zhiqiang Zhang, Jun Zhou 0011, Xin Zhao. 1812-1827 [doi]
- WebNovelBench: Placing LLM Novelists on the Web Novel DistributionLiangtao Lin, Jun Zheng, Haidong Wang. 1828-1847 [doi]
- From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity PredictionsYusuke Yamauchi, Akiko Aizawa. 1848-1877 [doi]
- Feature Drift: How Fine-Tuning Repurposes Representations in LLMsAndrey V. Galichin, Anton Korznikov, Alexey Dontsov, Oleg Rogov, Elena Tutubalina, Ivan V. Oseledets. 1878-1887 [doi]
- Detecting Winning Arguments with Large Language Models and Persuasion StrategiesTiziano Labruna, Arkadiusz Modzelewski, Giorgio Satta, Giovanni Da San Martino. 1888-1915 [doi]
- The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video CaptioningMingkai Tian, Guorong Li, Yuankai Qi, Anton van den Hengel, Qingming Huang. 1916-1929 [doi]
- Best-of-L: Cross-Lingual Reward Modeling for Mathematical ReasoningSara Rajaee, Rochelle Choenni, Ekaterina Shutova, Christof Monz. 1930-1939 [doi]
- Nuanced Toxicity Detection in Spanish: A New Corpus and Benchmark StudyAlba María Mármol-Romero, Robiert Sepúlveda-Torres, Estela Saquete, María-Teresa Martín Valdivia, L. Alfonso Ureña. 1940-1954 [doi]
- Persona Switch: Mixing Distinct Perspectives in Decoding TimeJunseok Kim, Nakyeong Yang, Kyomin Jung. 1955-1967 [doi]
- Revealing the Truth with ConLLM for Detecting Multi-Modal DeepfakesGautam Siddharth Kashyap, Harsh Joshi, Niharika Jain, Ebad Shabbir, Jiechao Gao, Nipun Joshi, Usman Naseem. 1968-1978 [doi]
- Detection of Adversarial Prompts with Model Predictive EntropyFranziska Rubenbauer, Sebastian Steindl, Patrick Levi, Daniel Loebenberger, Ulrich Schäfer 0001. 1979-1993 [doi]
- Actors, Frames and Arguments: A Multi-Decade Computational Analysis of Climate Discourse in Financial News using Large Language ModelsRuiran Su, Markus Leippold, Janet B. Pierrehumbert. 1994-2014 [doi]
- RECAP: REwriting Conversations for Intent Understanding in Agentic PlanningKushan Mitra, Dan Zhang 0025, Hannah Kim 0001, Estevam Hruschka. 2015-2033 [doi]
- Modeling Turn-Taking with Semantically Informed GesturesVarsha Suresh, Muhammad Hamza Mughal, Christian Theobalt, Vera Demberg. 2034-2041 [doi]
- Do Large Language Models Reflect Demographic Pluralism in Safety?Usman Naseem, Gautam Siddharth Kashyap, Sushant Kumar Ray, Rafiq Ali, Ebad Shabbir, Abdullah Mohammad. 2042-2052 [doi]
- Adversarial Decoding: Generating Readable Documents for Adversarial ObjectivesCollin Zhang, Tingwei Zhang, Vitaly Shmatikov. 2053-2068 [doi]
- MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue EvaluatorsJohn Mendonça, Alon Lavie, Isabel Trancoso. 2069-2097 [doi]
- Which Works Best for Vietnamese? A Practical Study of Information Retrieval Methods across DomainsLong S. T. Nguyen, Tho Quan. 2098-2119 [doi]
- MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny DetectionPaolo Italiani, David Gimeno-Gómez, Luca Ragazzi, Gianluca Moro, Paolo Rosso. 2120-2134 [doi]
- SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech RecognitionJunseok Oh, Ji-Hwan Kim. 2135-2144 [doi]
- Foundations of LLM Knowledge Materialization: Termination, Reproducibility, RobustnessLuca Giordano, Simon Razniewski. 2145-2164 [doi]
- Investigating Gender Stereotypes in Large Language Models via Social Determinants of HealthTrung Hieu Ngo, Adrien Bazoge, Solen Quiniou, Pierre-Antoine Gourraud, Emmanuel Morin. 2165-2180 [doi]
- FOL-Traces: Verified First-Order Logic Reasoning Traces at ScaleIsabelle Lee, Sarah Liaw, Dani Yogatama. 2181-2203 [doi]
- Uncertainty Quantification for Evaluating Gender Bias in Machine TranslationIeva Staliunaite, Julius Cheng, Andreas Vlachos 0001. 2204-2225 [doi]
- PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual AggregationYongfu Xue. 2226-2234 [doi]
- The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language ModelsKonrad Löhr, Shuzhou Yuan, Michael Färber 0001. 2235-2252 [doi]
- TIPA: Typologically Informed Parameter AggregationStef Accou, Wessel Poelman. 2253-2267 [doi]
- Can Calibration of Positional Encodings Enhance Long Context Utilization?Tom Zehle, Matthias Aßenmacher. 2268-2280 [doi]
- FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity RecognitionJonas Golde, Patrick Haller 0002, Alan Akbik. 2281-2300 [doi]
- Bias in the East, Bias in the West: A Bilingual Analysis of LLM Political Bias on U.S.- and China-Related IssuesYing Ying Lim, Paul Röttger. 2301-2326 [doi]
- Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin ToneShaivi Malik, Hasnat Md Abdullah, Sriparna Saha 0001, Amit P. Sheth. 2327-2388 [doi]
- A Simple and Efficient Learning-Style Prompting for LLM JailbreakingXuan Luo, Yue Wang, Zefeng He, Geng Tu, Jing Li, Ruifeng Xu 0001. 2389-2406 [doi]
- Aggregating Crowd of LLMs for Cost-Effective Data AnnotationJiacheng Liu, Xiaofeng Hou. 2407-2419 [doi]
- Representation Collapse in Machine Translation Through the Lens of Angular DispersionEvgeniia Tokarchuk, Maya K. Nachesa, Sergey Troshin 0001, Vlad Niculae. 2420-2431 [doi]
- Can LLMs Reason Like Doctors? Exploring the Limits of Large Language Models in Complex Medical ReasoningFlavio Merenda, José Manuél Gómez-Pérez, German Rigau. 2432-2452 [doi]
- Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of LuxembourgishCedric Lothritz, Jordi Cabot, Laura Bernardy. 2453-2476 [doi]
- Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse AutoencodersMathis Le Bail, Jérémie Dentan, Davide Buscaldi, Sonia Vanier. 2477-2504 [doi]
- TextMineX: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine ActionChenyue Zhou, Gürkan Solmaz, Flavio Cirillo, Kiril Gashteovski, Jonathan Fürst. 2505-2523 [doi]
- MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and ReasoningMahbub E. Sobhani, Md. Faiyaz Abdullah Sayeedi, Tasnim Mohiuddin, Md Mofijul Islam, Swakkhar Shatabda. 2524-2550 [doi]
- Enhancing Reliability in Community Question Answering with an Expert-Oriented RAG SystemSeyyede Zahra Aftabi, Saeed Farzi. 2551-2569 [doi]
- Unsupervised Text Style Transfer for Controllable IntensityShuhuan Gu, Wenbiao Tao, Xinchen Ma, Kangkang He, Ye Guo, Xiang Li, Yunshi Lan. 2570-2584 [doi]
- SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale DatabasesAmirHossein Safdarian, Milad Mohammadi, Ehsan Jahanbakhsh, Mona Shahamat Naderi, Heshaam Faili. 2585-2599 [doi]
- Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic EnhancementDiego Rossini, Lonneke van der Plas. 2600-2610 [doi]
- The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal StudyDavid Lindevelt, Suzan Verberne, Joost Broekens. 2611-2621 [doi]
- Seeing All Sides: Multi-Perspective In-Context Learning for Subjective NLPBenedetta Muscato, Yue Li, Gizem Gezici, Zhixue Zhao, Fosca Giannotti. 2622-2638 [doi]
- Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language ModelsKumiko Nakajima, Jan Zuiderveld, Sandro Pezzelle. 2639-2660 [doi]
- Are Multimodal LLMs Movie Buffs?Carlo Bretti, Pascal Mettes, Nanne van Noord. 2661-2677 [doi]
- Process Evaluation for Agentic SystemsMilan Gritta, Debjit Paul, Xiaoguang Li, Lifeng Shang, Jun Wang, Gerasimos Lampouras. 2678-2692 [doi]
- MIMIC: Multi-party Dialogue Augmentation via Speaker Stylistic TransferGaetano Cimino, Giuseppe Carenini, Vincenzo Deufemia. 2693-2719 [doi]
- TechING: Towards Real World Technical Image Understanding via VLMsTafazzul Nadeem, Bhavik Shangari, Manish Rai, Gagan Raj Gupta 0001, Ashutosh Modi. 2720-2749 [doi]
- Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual TextPiyush Singh Pasi. 2750-2771 [doi]
- Do GUI Grounders Truly Understand UI Elements?Surgan Jandial, Yinheng Li, Justin Wagle, Kazuhito Koishida. 2772-2785 [doi]
- Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference AttacksHaowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr. 2786-2799 [doi]
- SafeSearch: Do Not Trade Safety for Utility in LLM Search AgentsQiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang 0001, Joo-Kyung Kim. 2800-2815 [doi]
- SAGE : A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn Agent EvaluationRyan Shea, Yunan Lu 0001, Liang Qiu, Zhou Yu. 2816-2839 [doi]
- Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual ClassificationBranislav Pecher, Ján Cegin, Róbert Belanec, Ivan Srba, Jakub Simko, Mária Bieliková. 2840-2857 [doi]
- Dialogue is Better Than Monologue: Instructing Meidcal LLMs via Strategic ConversationsZijie Liu, Xinyu Zhao, Jie Peng 0002, Jinhao Duan, Zhuangdi Zhu, Qingyu Chen 0001, Kaidi Xu, Xia Hu 0001, Tianlong Chen 0001. 2858-2872 [doi]
- DF-RAG: Query-Aware Diversity for Retrieval-Augmented GenerationSaadat Hasan Khan, Spencer Hong, Jingyu Wu, Kevin Lybarger, Youbing Yin, Erin Babinsky, Daben Liu. 2873-2894 [doi]
- Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech EvaluationArjun Chandra, Kevin Miller, Venkatesh Ravichandran, Constantinos Papayiannis, Venkatesh Saligrama. 2895-2916 [doi]
- Improving Chain-of-Thought for Logical Reasoning via Attention-Aware InterventionPhuong Minh Nguyen, Dang Huu-Tien, Naoya Inoue. 2917-2941 [doi]
- Thinking Long, but Short: Stable Sequential Test-Time Scaling for Large Reasoning ModelsMichael R. Metel, Yufei Cui, Boxing Chen, Prasanna Parthasarathi. 2942-2951 [doi]
- Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language ModelsBoyang Zhang, Istemi Ekin Akkus, Ruichuan Chen, Alice Dethise, Klaus Satzke, Ivica Rimac, Yang Zhang 0016. 2952-2965 [doi]
- TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question AnsweringMohammadamin Shafiei, Hamidreza Saffari, Mohammad Taher Pilehvar, Alessandro Raganato. 2966-2987 [doi]
- FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model CompressionJiayi Tian, Ryan Solgi, Jinming Lu, Yifan Yang, Hai Li 0008, Zheng Zhang 0005. 2988-3002 [doi]
- Negative Sampling Techniques in Dense Retrieval: A SurveyLaurin Wischounig, Abdelrahman Abdallah, Adam Jatowt. 3003-3020 [doi]
- Multi-Agent Procedural Graph Extraction with Structural and Logical RefinementWangyang Ying, Yanchi Liu, Xujiang Zhao, Wei Cheng, Zhengzhang Chen, Wenchao Yu, Yanjie Fu, Haifeng Chen. 3021-3034 [doi]
- MADIAVE: Multi-Agent Debate for Implicit Attribute Value ExtractionWei-Chieh Huang, Cornelia Caragea. 3035-3053 [doi]
- DeepSieve: Information Sieving via LLM-as-a-Knowledge-RouterMinghao Guo, Qingcheng Zeng, Xujiang Zhao, Yanchi Liu, Wenchao Yu, Mengnan Du, Haifeng Chen, Wei Cheng 0002. 3054-3077 [doi]
- Analyzing LLM Instruction Optimization for Tabular Fact VerificationXiaotang Du, Giwon Hong, Wai-Chung Kwan, Rohit Saxena, Ivan Titov 0001, Pasquale Minervini, Emily Allaway. 3078-3108 [doi]
- XMAD-Bench: Cross-Domain Multilingual Audio Deepfake BenchmarkIoan-Paul Ciobanu, Andrei Iulian Hîji, Nicolae-Catalin Ristea, Paul Irofti, Cristian Rusu, Radu-Tudor Ionescu. 3109-3120 [doi]
- CLEAR-3K: Assessing Causal Explanatory Capabilities in Language ModelsNaiming Liu, Richard G. Baraniuk, Shashank Sonkar. 3121-3136 [doi]
- Imbalanced Gradients in RL Post-Training of Multi-Task LLMsRunzhe Wu, Ankur Samanta, Ayush Jain, Scott Fujimoto, Jeongyeol Kwon, Ben Kretzu, Youliang Yu, Kaveh Hassani, Boris Vidolov, Yonathan Efroni. 3137-3150 [doi]
- BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow GenerationBo Yuan, Yun Zhou, Zhichao Xu 0001, Kiran Ramnath, Aosong Feng, Balasubramaniam Srinivasan. 3151-3179 [doi]
- HapticLLaMA: A Multimodal Sensory Language Model for Haptic CaptioningGuimin Hu, Daniel Hershcovich, Hasti Seifi. 3180-3192 [doi]
- Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead GenerationZizhong Li, Haopeng Zhang, Jiawei Zhang. 3193-3206 [doi]
- PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model InferenceKrishna Teja Chitty-Venkata, Jie Ye, Siddhisanket Raskar, Anthony Kougkas, Xian-He Sun, Murali Emani, Venkatram Vishwanath, Bogdan Nicolae. 3207-3218 [doi]
- SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural IntegrityIshani Mondal, Meera Bharadwaj, Ayush Roy, Aparna Garimella, Jordan Lee Boyd-Graber. 3219-3245 [doi]
- ConVerse: Benchmarking Contextual Safety in Agent-to-Agent ConversationsAmr Gomaa, Ahmed Salem 0001, Sahar Abdelnabi. 3246-3268 [doi]
- SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured ReasoningKaiwen Zhou 0002, Ahmed Elgohary, A S. M. Iftekhar, Amin Saied. 3269-3292 [doi]
- Who You Are, What You Say: Intra- and Inter- Context Personality for Emotion Recognition in ConversationTazeek Bin Abdur Rakib, Lay-Ki Soon, Wern Han Lim. 3293-3308 [doi]
- DRIVINGVQA: A Dataset for Interleaved Visual Chain-of-Thought in Real-World Driving ScenariosCharles Corbière, Simon Roburin, Syrielle Montariol, Antoine Bosselut, Alexandre Alahi. 3309-3333 [doi]
- SAGE: Steerable Agentic Data Generation for Deep Search with Execution FeedbackFangyuan Xu, Rujun Han, Yanfei Chen, Zifeng Wang 0002, I-Hung Hsu, Jun Yan 0001, Vishy Tirumalashetty, Eunsol Choi, Tomas Pfister, Chen-Yu Lee. 3334-3351 [doi]
- Negative-Aware Diffusion Process for Temporal Knowledge Graph ExtrapolationYanglei Gan, Peng He, Yuxiang Cai, Run Lin, Guanyu Zhou, Qiao Liu 0003. 3352-3367 [doi]
- DS2-Instruct: Domain-Specific Data Synthesis for Large Language Models Instruction TuningRuiyao Xu, Noelle I. Samia, Han Liu. 3368-3384 [doi]
- DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive DashboardsAaryaman Kartha, Ahmed Masry, Mohammed Saidul Islam, Thinh Lang, Shadikur Rahman, Ridwan Mahbub, Mizanur Rahman, Mahir Ahmed, Md. Rizwan Parvez, Enamul Hoque, Shafiq Joty. 3385-3407 [doi]
- Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?Zhiting Mei, Christina Zhang, Tenny Yin, Justin Lidard, Ola Shorinwa, Anirudha Majumdar. 3408-3458 [doi]
- AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African LanguagesNaome A. Etori, Kelechi Ezema, Nathaniel Romney Robinson, Davis David, Alfred Malengo Kondoro, Elisha Ondieki Makori, Michael S. Mollel, Maria L. Gini. 3459-3492 [doi]
- Diffusion Language Model Inference with Monte Carlo Tree SearchZheng Huang, Kiran Ramnath, Yueyan Chen, Aosong Feng, Sangmin Woo, Balasubramaniam Srinivasan, Zhichao Xu 0001, Kang Zhou, Shuai Wang, Haibo Ding, Lin Lee Cheong. 3493-3512 [doi]
- DWA-KD: Dual-Space Weighting and Time-Warped Alignment for Cross-Tokenizer Knowledge DistillationDuc Trung Vu, Chi Pham Khanh, Phi Van Dat, Ngo Van Linh 0001, Dinh Viet Sang, Trung Le 0001. 3513-3527 [doi]
- Harnessing Consistency for Robust Test-Time LLM EnsembleZhichen Zeng 0001, Qi Yu, Xiao Lin 0016, Ruizhong Qiu, Xuying Ning, Tianxin Wei, Yuchen Yan, Jingrui He, Hanghang Tong. 3528-3545 [doi]
- AutoAnoEval: Semantic-Aware Model Selection via Tree-Guided LLM Reasoning for Tabular Anomaly DetectionSuhee Yoon, Sanghyu Yoon, Ye Seul Sim, Seungdong Yoa, Dongmin Kim, Soonyoung Lee, Hankook Lee, Woohyung Lim. 3546-3560 [doi]
- Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM AlignmentTiejin Chen, Xiaoou Liu, Vishnu Nandam, Kuanru Liou, Hua Wei 0001. 3561-3572 [doi]
- ThinkPilot: Steering Reasoning Models via Automated Think-prefixes OptimizationSunzhu Li, Zhiyu Lin, Jiale Zhao, Shuling Yang, Chen Wei. 3573-3592 [doi]
- LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-CorrectionShengmin Piao, Jieun Lee, Sanghyun Park. 3593-3608 [doi]
- Beyond Coherence: Improving Temporal Consistency and Interpretability in Dynamic Topic ModelsThanh Vinh Nguyen, Ngo Van Dong, Minh Chu Xuan, Tung Nguyen, Linh Ngo Van, Dinh Viet Sang, Trung Le. 3609-3629 [doi]
- Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug UseYiyang Li, Zehong Wang, Zhengqing Yuan, Zheyuan Zhang, Keerthiram Murugesan, Chuxu Zhang, Yanfang Ye 0001. 3630-3647 [doi]
- Tailoring Memory Granularity for Multi-Hop Reasoning over Long ContextsPeijun Qing, Xingjian Diao, Chiyu Ma, Saeed Hassanpour, Soroush Vosoughi. 3648-3666 [doi]
- Unlocking Large Audio-Language Models for Interactive Language LearningHongfu Liu, Zhouying Cui, Xiangming Gu, Ye Wang. 3667-3690 [doi]
- Blind Spot Navigation in Large Language Model Reasoning with Thought Space ExplorerJinghan Zhang 0002, Fengran Mo, Tharindu Cyril Weerasooriya, Xinyue Ye, Dongjie Wang, Yanjie Fu, Kunpeng Liu 0001. 3691-3707 [doi]
- StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMsHaohan Yuan, Sukhwa Hong, Haopeng Zhang 0005. 3708-3721 [doi]
- Logits-Based Block Pruning with Affine Transformations for Large Language ModelsZekun Hu, Yichu Xu, De-Chuan Zhan. 3722-3736 [doi]
- MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection ModelsMartin Hyben, Sebastian Kula, Ján Cegin, Jakub Simko, Ivan Srba, Róbert Móro. 3737-3754 [doi]
- What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data EffectsNaihao Deng, Sheng Zhang 0029, Henghui Zhu, Shuaichen Chang, Jiani Zhang 0003, Alexander Hanbo Li, Chung-Wei Hang, Hideo Kobayashi, Yiqun Hu, Patrick Ng. 3755-3782 [doi]
- Evaluating Morphological Plausibility of Subword Tokenization via Statistical Alignment with Morpho-Syntactic FeaturesAbishek Stephen, Jindrich Libovický. 3783-3791 [doi]
- MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive LearningVera Pavlova, Mohammed Makhlouf. 3792-3807 [doi]
- BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream CamouflageKalyan Nakka, Nitesh Saxena. 3808-3834 [doi]
- The Problem of Ambiguity in Table Question AnsweringJorge Osés Grijalba, L. Alfonso Ureña, Eugenio Martínez-Cámara, José Camacho-Collados. 3835-3848 [doi]
- Beyond Multiple Choice: Evaluating Steering Vectors for SummarizationJoschka Braun, Carsten Eickhoff, Seyed Ali Bahrainian. 3849-3884 [doi]
- Similar Region Search using LLMs on Spatial Feature SpaceAl-Amin Sany, Mohaiminul Islam, Tanzima Hashem, Md. Ashraful Islam, Mohammed Eunus Ali. 3885-3898 [doi]
- Learning to Ask: Multi-Decoder Fine-Tuning for Multi-Hop Visual Question Generation with External KnowledgeArpan Phukan, Manish Gupta, Asif Ekbal. 3899-3918 [doi]
- SLANG-GraphRAG: Multi-Layered Retrieval with Domain-Specific Knowledge for Low Resource Social Media ConversationsIfeoluwa Wuraola, Daniel Marciniak, Nina Dethlefs. 3919-3931 [doi]
- Science Across Languages: Assessing LLM Multilingual Translation of Scientific PapersHannah Calzi Kleidermacher, James Zou. 3932-3947 [doi]
- TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMsMinJae Lee, Wonjun Kang, Byeongkeun Ahn, Christian Classen, Kevin Galim, Seunghyuk Oh, Minghao Yan, Hyung il Koo, Kangwook Lee 0001. 3948-3974 [doi]
- KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM KnowledgeAlex Robertson, Huizhi Liang, Mahbub Gani, Rohit Kumar, Srijith Rajamohan. 3975-3989 [doi]
- Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated CodeJungin Kim, Shinwoo Park, Yo-Sub Han. 3990-4002 [doi]
- VIGiA: Instructional Video Guidance via Dialogue Reasoning and RetrievalDiogo Glória-Silva, David Semedo, João Magalhães. 4003-4030 [doi]
- Attribute-Controlled Translation with Preference OptimizationInigo Jauregi Unanue, Najmeh Sadoughi, Vimal Bhat, Zhu Liu, Massimo Piccardi. 4031-4057 [doi]
- ReciFine: Finely Annotated Recipe Dataset for Controllable Recipe GenerationNuhu Ibrahim, Rishi Ravikumar, Robert Stevens, Riza Batista-Navarro. 4058-4074 [doi]
- ReBPE: Iteratively Improving the Internal Structure of a Structured Tokeniser by Mining its Internal StructureThomas Bauwens, Miryam de Lhoneux. 4075-4090 [doi]
- Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical FusionXiao Li, Kotaro Funakoshi, Manabu Okumura. 4091-4106 [doi]
- Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMsYusuke Nakamura, Hirokazu Kiyomaru, Chaoran Liu, Shuhei Kurita, Daisuke Kawahara. 4107-4113 [doi]
- Revealing Redundant Syntax in Large Language Models through Multi-Hop Dependency PathsMasaki Sashida, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo. 4114-4137 [doi]
- A Scalable Framework for Automated NER Annotation Correction in Low-Resource LanguagesToqeer Ehsan, Thamar Solorio. 4138-4151 [doi]
- Can ChatGPT Really Understand Modern Chinese Poetry?Shanshan Wang 0009, Derek F. Wong, Jingming Yao, Lidia S. Chao. 4152-4162 [doi]
- Knowing What's Missing: Assessing Information Sufficiency in Question AnsweringAkriti Jain 0001, Aparna Garimella. 4163-4174 [doi]
- The Curse of Verbalization: How Presentation Order Constrains LLM ReasoningYue Zhou, Henry Peng Zou, Barbara Di Eugenio, Yang Zhang. 4175-4185 [doi]
- PATS: Personality-Aware Teaching Strategies with Large Language Model TutorsDonya Rooein, Sankalan Pal Chowdhury, Mariia Eremeeva, Yuan Qin, Debora Nozza, Mrinmaya Sachan, Dirk Hovy. 4186-4211 [doi]
- Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality TheoryYiheng Zhao, Yuanliang Li, Shreya Savant, Jun Yan. 4212-4222 [doi]
- JuriFindIT: an Italian legal retrieval datasetNiko Dalla Noce, Davide Colla, Sina Farhang Doust, Lorenzo De Mattei, Davide Bacciu. 4223-4241 [doi]
- Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification ApproachesNoopur Zambare, Kiana Aghakasiri, Carissa Lin, Carrie Ye, J. Ross Mitchell, Mohamed Abdalla. 4242-4257 [doi]
- How Many Ratings per Item are Necessary for Reliable Significance Testing?Christopher M. Homan, Flip Korn, Deepak Pandita, Chris Welty. 4258-4273 [doi]
- QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal PairsDavid Beauchemin, Pier-Luc Veilleux, Richard Khoury, Johanna-Pascale Roy. 4274-4304 [doi]
- QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion TaskMae Sosto, Delfina Sol Martinez Pandiani, Laura Hollink. 4305-4326 [doi]
- Efficient Table Retrieval and Understanding with Multimodal Large Language ModelsZhuoyan Xu, Haoyang Fang, Boran Han, Bonan Min, Bernie Wang 0001, Cuixiong Hu, Shuai Zhang 0007. 4327-4340 [doi]
- FedReFT: Federated Representation Fine-Tuning with All-But-Me AggregationFatema Siddika, Md. Anwar Hossen, Juan Pablo Muñoz, Tanya G. Roosta, Anuj Sharma 0001, Ali Jannesari. 4341-4362 [doi]
- RiddleBench: A New Generative Reasoning Benchmark for LLMsDeepon Halder, Alan Saji, Thanmay Jayakumar, Anoop Kunchukuttan, Ratish Puduppully, Raj Dabre. 4363-4372 [doi]
- Language Model-Driven Data Pruning Enables Efficient Active LearningAbdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza. 4373-4392 [doi]
- HARM: Learning Hate-Aware Reward Model for Evaluating Natural Language Explanations of Offensive ContentLorenzo Puppi Vecchi, Alceu de Souza Britto Jr., Emerson Cabrera Paraiso, Rafael M. O. Cruz. 4393-4431 [doi]
- MATH-IDN: A Multilingual Mathematical Problem Solving Dataset Featuring Local Languages in IndonesiaXiao Xiao, Iftitahu Ni'mah, Yuyun Wabula, Mykola Pechenizkiy, Meng Fang. 4432-4438 [doi]
- Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation ModulesYilun Liu, Yunpu Ma, Yuetian Lu, Shuo Chen 0014, Zifeng Ding, Volker Tresp. 4439-4457 [doi]
- MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori InferenceZheyuan Zhang, Lin Ge, Hongjiang Li, Weicheng Zhu, Chuxu Zhang, Yanfang Ye 0001. 4458-4480 [doi]
- Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-ThoughtBowen Li 0012, Ziqi Xu 0001, Jing Ren 0001, Renqiang Luo, Xikun Zhang 0007, Xiuzhen Zhang 0001, Yongli Ren, Feng Xia 0001. 4481-4499 [doi]
- ExpressivityBench: Can LLMs Communicate Implicitly?Joshua Tint, Som Sagar, Aditya Taparia, Kelly Raines, Bimsara Pathiraja, Caleb Liu, Ransalu Senanayake. 4500-4515 [doi]
- Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQLMd Mahadi Hasan Nahid, Davood Rafiei, Weiwei Zhang, Yong Zhang. 4516-4546 [doi]
- PEAR: Planner-Executor Agent Robustness BenchmarkShen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing 0002. 4547-4567 [doi]
- Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent DecompositionZheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang, Se-eun Yoon, Rachit Pareek, Michelle Gong. 4568-4584 [doi]
- Linguistic Cues for LLM-based Implicit Discourse Relation ClassificationYi Fan, Michael Strube, Wei Liu. 4585-4602 [doi]
- SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text QualityDuy C. Hoang, Hung T. Q. Le, Rui Chu, Ping Li 0001, Weijie Zhao 0001, Yingjie Lao, Khoa D. Doan. 4603-4626 [doi]
- Pretraining Language Models for Diachronic Linguistic Change DiscoveryElisabeth Fittschen, Sabrina Li, Tom Lippincott, Leshem Choshen, Craig Messner. 4627-4642 [doi]
- Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented EnglishAdyasha Patra, Dhiraj Kumar Sah, Preethi Jyothi. 4643-4656 [doi]
- Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory RetrievalAditya Sharma, Christopher Pal, Amal Zouaq. 4657-4668 [doi]
- Jailbreaking Safeguarded Text-to-Image Models via Large Language ModelsZhengyuan Jiang, Yuepeng Hu, Yuchen Yang 0001, Yinzhi Cao, Neil Zhenqiang Gong. 4669-4684 [doi]
- BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio ReconstructionHaoran Wang, Jiatong Shi, Jinchuan Tian, Bohan Li, Kai Yu, Shinji Watanabe 0001. 4685-4697 [doi]
- Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box OptimizationJiwei Guan, Hai Jin 0001, Haohan Wang. 4698-4708 [doi]
- SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory GraphJiazheng Li 0012, Yawei Wang, Qiaojing Yan, Yijun Tian 0001, Zhichao Xu 0001, Huan Song, Panpan Xu, Lin Lee Cheong. 4709-4725 [doi]
- UniToolBench: A Benchmark for Tool-Augmented LLMs in Cross-Domain, Universal Task AutomationXiaojie Guo 0002, Yang Zhang, Bing Zhang, Ryo Kawahara, Mikio Takeuchi, Yada Zhu. 4726-4736 [doi]
- Benchmarking the Energy Savings with Speculative Decoding StrategiesRohit Dutta, Paramita Koley, Soham Poddar, Janardan Misra, Sanjay Podder, Naveen Balani, Saptarshi Ghosh 0001, Niloy Ganguly. 4737-4748 [doi]
- Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation UnderstandingYeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, Jia Kang, Sangho Kim, Jaejin Lee. 4749-4793 [doi]
- What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM PerformanceWilliam Watson, Nicole Cho, Sumitra Ganesh, Manuela Veloso. 4794-4827 [doi]
- Completely Modular Fine-tuning for Dynamic Language AdaptationZhe Cao 0002, Yusuke Oda, Qianying Liu, Akiko Aizawa, Taro Watanabe. 4828-4845 [doi]
- A Multi-Task Learning Framework for Modeling Engagement and Topic-Sensitive Responses in Arabic Women's DiscourseMabrouka Bessghaier, Md. Rafiul Biswas, Shimaa Ibrahim, Wajdi Zaghouani. 4846-4854 [doi]
- We Are What We Repeatedly Do: Improving Long Context Instruction FollowingPreston K. Robinette, Andrew Hard, Swaroop Ramaswamy, Ehsan Amid, Rajiv Mathews, Taylor T. Johnson. 4855-4884 [doi]
- ConRAS: Contrastive In-context Learning Framework for Retrieval-Augmented SummarizationJuseon-Do, Sungwoo Han, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura. 4885-4900 [doi]
- Beyond Sampling: Self-Sorting for Long-Context RankingJuseon-Do, Sungwoo Han, Jingun Kwon, Hidetaka Kamigaito, Katsuhiko Hayashi 0001, Taro Watanabe. 4901-4910 [doi]
- Program-of-Thought Reveals LLM Abstraction CeilingsMike Zhou, Fenil Bardoliya, Vivek Gupta 0001, Dan Roth 0001. 4911-4919 [doi]
- From Numbers to Narratives: Efficient Language Model-Based Detection for Safety-Critical Minority ClassesAhatsham Hayat, Hunter Tridle, Mohammad Rashedul Hasan. 4920-4937 [doi]
- R-GDA: Reflective Guidance Data Augmentation with Multi-Agent Feedback for Domain-Specific Named Entity RecognitionHyeonseok Kang, Hyuk Namgoong, Goun Pyeon, Sangkeun Jung. 4938-4953 [doi]
- Enabling Autoregressive Models to Fill In Masked TokensDaniel Israel, Aditya Grover, Guy Van den Broeck. 4954-4965 [doi]
- Position Encoding with Random Float Sampling Enhances Length Generalization of TransformersAtsushi Shimizu, Shohei Taniguchi, Yutaka Matsuo. 4966-4980 [doi]
- Open-Domain Safety Policy ConstructionDi Wu, Siyue Liu, Zixiang Ji, Ya-Liang Chang, Zhe Yu Liu, Andrew Pleffer, Kai-Wei Chang. 4981-4999 [doi]
- Think Just Enough: Leveraging Self-Assessed Confidence for Adaptive Reasoning in Language ModelsJunyeob Kim, Sang-goo Lee, Taeuk Kim. 5000-5006 [doi]
- CLICKER: Cross-Lingual Knowledge Editing via In-Context Learning with Adaptive Stepwise ReasoningZehui Jiang, Xin Zhao, Yuta Kumadaki, Naoki Yoshinaga 0001. 5007-5022 [doi]
- Show or Tell? Modeling the evolution of request-making in Human-LLM conversationsShengqi Zhu 0002, Jeffrey M. Rzeszotarski, David Mimno. 5023-5034 [doi]
- Multilingual Self-Taught Faithfulness EvaluatorsCarlo Alfano, Aymen Al Marjani, Zeno Jonke, Amin Mantrach, Saab Mansour, Marcello Federico. 5035-5051 [doi]
- Benchmarking Direct Preference Optimization for Medical Large Vision-Language ModelsDain Kim, Jiwoo Lee, Jaehoon Yun, Yonghoe Koo, Qingyu Chen 0001, Hyunjae Kim, Jaewoo Kang. 5052-5067 [doi]
- Stay Focused: Problem Drift in Multi-Agent DebateJonas Becker, Lars Benedikt Kaesberg, Andreas Stephan, Jan Philip Wahle, Terry Ruas, Bela Gipp. 5068-5102 [doi]
- FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness EvaluationYulia Otmakhova 0001, Thinh Hung Truong, Rahmad Mahendra, Zenan Zhai, Rongxin Zhu, Daniel Beck, Jey Han Lau. 5103-5123 [doi]
- Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMsYiheng Yang, Yujie Wang, Chi Ma, Lei Yu, Emmanuele Chersoni, Chu-Ren Huang. 5124-5138 [doi]
- PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHingAnthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma 0002. 5139-5153 [doi]
- Argument Component Segmentation with Fine-Tuned Large Language ModelsEttore Caputo, Sergio Greco, Lucio La Cava. 5154-5167 [doi]
- DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP ProtectionYuliang Yan, Haochun Tang, Shuo Yan, Enyan Dai. 5168-5184 [doi]
- The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMsAsif Azad, Mohammad Sadat Hossain, MD Sadik Hossain Shanto, M. Saifur Rahman, Md. Rizwan Parvez. 5185-5201 [doi]
- Diagnosis of Dysarthria Severity and Explanation Generation Using XAI-Enhanced CLINIC-GENIE on Diadochokinetic TasksJihyeon Kim, Insung Lee, Myoung-Wan Koo. 5202-5222 [doi]
- A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across LanguagesRaoyuan Zhao, Yihong Liu 0001, Hinrich Schütze, Michael A. Hedderich. 5223-5247 [doi]
- ORSO QGen: Odds-Ratio Steerable Optimization for Controlling Question GenerationAndreea-Nicoleta Dutulescu, Stefan Ruseti, Mihai Dascalu, Danielle S. McNamara. 5248-5259 [doi]
- Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource LanguagesNoam Dahan, Omer Kidron, Gabriel Stanovsky. 5260-5273 [doi]
- Let's Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence SimplificationJingshen Zhang, Xin Ying Qiu, Lifang Lu, Zhuhua Huang, Yutao Hu, Yuechang Wu, Junyu Lu. 5274-5290 [doi]
- LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text GenerationYupian Lin, Guangya Yu, Cheng Yuan, Huan Du, Hui Luo, Yuang Bian, JingPing Liu, Zhidong He, Wen Du, Tong Ruan. 5291-5303 [doi]
- IRPO: Implicit Policy Regularized Preference OptimizationYoungsoo Jang, Yu-Jin Kim, Geon-hyeong Kim, Honglak Lee, Moontae Lee. 5304-5325 [doi]
- DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM PerformanceSeffi Cohen, Nurit Cohen-Inger, Niv Goldshlager, Bracha Shapira, Lior Rokach. 5326-5336 [doi]
- Ranking Human and LLM Texts Using Locality StatisticsYiyang Wang, Chen Ding, Hangfeng He. 5337-5348 [doi]
- MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga UnderstandingJeonghun Baek, Kazuki Egashira, Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Hikaru Ikuta, Kiyoharu Aizawa. 5349-5370 [doi]
- Hierarchical User Intent Inference with Knowledge Graph GroundingTzu-Cheng Peng, Chien Chin Chen, Yung-Chun Chang. 5371-5377 [doi]
- Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data SelectionJoe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, Marek Rei. 5378-5404 [doi]
- MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language ModelsSiwei Wu, King Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, Jiaheng Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang 0009, Wenhao Huang 0001, Chenghua Lin. 5405-5419 [doi]
- COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human ValuesSiwei Wu, Jincheng Ren, Xeron Du, Shuyue Guo, Xingwei Qu, Yiming Liang, Jie Liu, Yunwen Li, Tyler Loakman, Tianyu Zheng, Boyu Feng, Huaqing Yuan, Zili Wang, Jiaheng Liu, Wenhao Huang 0001, Chenglin Cai, Haoran Que, Jian Yang 0003, Yuelin Bai, Zekun Moore Wang, Zhouliang Yu, Qunshu Lin, Ding Pan, Yuchen Eleanor Jiang, Tiannan Wang, Wangchunshu Zhou, Shenzhi Wang, Xingyuan Bu, Minghao Liu 0003, Guoyin Wang, Ge Zhang 0009, Chenghua Lin. 5420-5447 [doi]
- Revealing the Numeracy Gap: An Empirical Investigation of Text Embedding ModelsNingyuan Deng, Hanyu Duan, Yixuan Tang, Yi Yang. 5448-5461 [doi]
- code-transformed: The Influence of Large Language Models on CodeYuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan 0001, Xuanhua Shi, Dongping Chen. 5462-5490 [doi]
- Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixingMukund Choudhary, Madhur Jindal, Gaurja Aeron, Monojit Choudhury. 5491-5509 [doi]
- ART: Attention-Regularized Transformers for Multi-Modal RobustnessMohammed Bouri, Mohammed Erradi, Adnane Saoud. 5510-5535 [doi]
- GRAFF: GRaph-Augmented Fine-grained Fusion for Large Language ModelsHimanshu Chaudhary, Ruida Wang, Gowtham Ramesh, Junjie Hu. 5536-5547 [doi]
- Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum LearningJerry Huang, Siddarth Madala, Risham Sidhu, Cheng Niu, Hao Peng, Julia Hockenmaier, Tong Zhang 0001. 5548-5561 [doi]
- RoD-TAL: A Benchmark for Answering Questions in Romanian Driving License ExamsAndrei Vlad Man, Razvan-Alexandru Smadu, Cristian-George Craciun, Dumitru-Clementin Cercel, Florin Pop, Mihaela-Claudia Cercel. 5562-5602 [doi]
- FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMsAlbert Sawczyn, Jakub Binkowski, Denis Janiak, Bogdan Gabrys, Tomasz Kajdanowicz. 5603-5621 [doi]
- Punctuations and Predicates in Language ModelsSonakshi Chauhan, Maheep Chaudhary, Kwan Kiu Choy, Samuel Nellessen, Nandi Schoots. 5622-5636 [doi]
- Test-time Corpus Feedback: From Retrieval to RAGMandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand. 5637-5656 [doi]
- RADAR: A Reasoning-Guided Attribution Framework for Explainable Visual Data AnalysisAnku Rani, Aparna Garimella, Apoorv Saxena, Balaji Vasan Srinivasan, Paul Pu Liang. 5657-5677 [doi]
- MaskLoRA: Low-Rank Subspace-Induced Token Masking for Efficient and Faithful Language ModelsRifat Rafiuddin. 5678-5692 [doi]
- A Domain-Specific Curated Benchmark for Entity and Document-Level Relation ExtractionMarco Martinelli 0003, Stefano Marchesin 0001, Vanessa Bonato, Giorgio Maria Di Nunzio, Nicola Ferro 0001, Ornella Irrera, Laura Menotti, Federica Vezzani, Gianmaria Silvello. 5693-5711 [doi]
- What Matters to an LLM? Behavioral and Computational Evidences from SummarizationYongxin Zhou 0004, Changshun Wu, Philippe Mulhem, Didier Schwab, Maxime Peyrard. 5712-5737 [doi]
- Neural network embeddings recover value dimensions from psychometric survey items on par with human dataMax Pellert, Clemens Lechner, Indira Sen, Markus Strohmaier. 5738-5752 [doi]
- Compositional Reasoning via Joint Image and Language DecompositionDwip Dalal, Madhav Kanda, Zhenhailong Wang, Heng Ji 0001, Unnat Jain. 5753-5775 [doi]
- Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning CapabilitiesManan Roy Choudhury, Adithya Chandramouli, Mannan Anand, Vivek Gupta. 5776-5818 [doi]
- Token-Wise Kernels (TWiKers) for Vicinity-Aware Attention in TransformersKuangdai Leng, Jia Bi, Samuel Pinilla, Jaehoon Cha. 5819-5835 [doi]
- Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pre-trainingJeffrey Li, Joshua P. Gardner, Doug Kang, Fangping Shi, Karanjeet Singh 0003, Chun-Liang Li, Herumb Shandilya, David Leo Wright Hall, Oncel Tuzel, Percy Liang, Ludwig Schmidt, Hadi Pouransari, Fartash Faghri. 5836-5861 [doi]
- Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsMichal Pietruszka, Lukasz Borchmann, Aleksander Jedrosz, Pawel Morawiecki. 5862-5886 [doi]
- Distill and Align Decomposition for Enhanced Claim VerificationJabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Charese Smiley, Xiaomo Liu, Manuela Veloso. 5887-5912 [doi]
- Argument-Based Consistency in Toxicity Explanations of LLMsRamaravind Kommiya Mothilal, Joanna Roy, Syed Ishtiaque Ahmed, Shion Guha. 5913-5941 [doi]
- Reasoning Beyond Literal: Cross-style Multimodal Reasoning for Figurative Language UnderstandingSeyyed Saeid Cheshmi, Hahnemann Ortiz, James Mooney, Dongyeop Kang. 5942-5956 [doi]
- QueStER: Query Specification for Generative Keyword-Based RetrievalArthur Satouf, Yuxuan Zong, Habiboulaye Amadou Boubacar, Pablo Piantanida, Benjamin Piwowarski. 5957-5968 [doi]
- Evaluating Sparse Autoencoders for Monosemantic RepresentationMoghis Fereidouni, Muhammad Umair Haider, Peizhong Ju, A. B. Siddique 0001. 5969-5984 [doi]
- Event Detection with a Context-Aware Encoder and LoRA for Improved Performance on Long-Tailed ClassesAbdullah Al-Monsur, Nitesh Vamshi Bommisetty, Gene Louis Kim. 5985-6003 [doi]
- Think Hard Only When Needed: A Hybrid Best-of-N and Beam Search for Efficient Test-Time ComputeHyewon Suh, Chaojian Li, Cheng-Jhih Shih, Zheng Wang, Kejing Xia, Yonggan Fu, Yingyan Celine Lin. 6004-6017 [doi]
- Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language ModelsDorde Klisura, Joseph Khoury, Ashish Kundu, Ram Krishnan, Anthony Rios. 6018-6034 [doi]
- NL2Logic: AST-Guided Translation of Natural Language into First-Order Logic with Large Language ModelsRizky Ramadhana Putra, Raihan Sultan Pasha Basuki, Yutong Cheng, Peng Gao. 6035-6051 [doi]
- Coding Agents with Multimodal Browsing are Generalist Problem SolversAditya Bharat Soni, Boxuan Li, Xingyao Wang 0002, Valerie Chen, Graham Neubig. 6052-6069 [doi]
- Quantifying Data Contamination in Psychometric Evaluations of LLMsJongwook Han, Woojung Song, Jonggeun Lee, Yohan Jo. 6070-6088 [doi]
- Task-aware Block Pruning with Output Distribution Signals for Large Language ModelsSong-ha Jo, Youngrok Ko, Sang-goo Lee, Jinseok Seol. 6089-6107 [doi]
- LARA: LLM-based Agile Power Distribution Network Restoration from Disastrous EventsJishnu Warrier, Heqing Huang, Yuzhang Lin, Sai Qian Zhang. 6108-6116 [doi]
- Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric BenchmarkMohammad Khodadad, Ali Shiraee Kasmaee, Mahdi Astaraki, Nicholas Sherck, Hamidreza Mahyar, Soheila Samiee. 6117-6143 [doi]
- SD-E2: Semantic Exploration for Reasoning Under Token BudgetsKshitij Mishra, Nils Lukas, Salem Lahlou. 6144-6157 [doi]
- How to Contextualize Empirical Data for Risk Analysis with LLMs: A Case Study of Power OutagesHaiyun Huang, Yukun Li, Marco A Pretell, Jacob Naroian, Ebadah Khan, Liping Liu. 6158-6172 [doi]
- Thinking Beyond the Local: Multi-View Instructed Adaptive Reasoning in KG-Enhanced LLMsMinghan Zhang, Shu Zhao 0005, Zhen Yang 0010, Hongsheng Wu, Yongxing Lin, Haodong Zou, Jie Chen 0025, Zhen Duan. 6173-6188 [doi]
- DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable AttributionLekkala Sai Teja, Siva Gopala Krishna Nuthakki, Ufaq Khan, Muhammad Haris Khan, Atul Mishra. 6189-6206 [doi]
- FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained EvaluationJuhyun Oh, Nayeon Lee, Chani Jung, Jiho Jin, Junho Myung, Jongwon Lee, Taieui Song, Alice Oh. 6207-6226 [doi]
- Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMsJunbo Li, Peng Zhou, Rui Meng, Meet P. Vadera, Lihong Li, Yang Li. 6227-6243 [doi]
- Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain AnnotationMinhua Lin, Zhengzhang Chen, Yanchi Liu, Xujiang Zhao, Zongyu Wu 0001, Junxiang Wang, Xiang Zhang 0001, Suhang Wang, Haifeng Chen. 6244-6281 [doi]
- Multi-Hall-SA: A Cross-lingual Benchmark for Multi-Type Hallucination Detection in Low-Resource South African LanguagesSello Ralethe, Jan Buys. 6282-6296 [doi]
- Query4Regex: Verifiable Regex Transformation through Formal Operations from NL and DSL QueriesJoonghyuk Hahn, Yo-Sub Han. 6297-6305 [doi]
- SrcMix: Mixing of Related Source Languages Benefits Extremely Low-resource Machine TranslationSanjeev Kumar 0007, Preethi Jyothi, Pushpak Bhattacharyya. 6306-6323 [doi]
- IMRNNs: An Efficient Method for Interpretable Dense Retrieval via Embedding ModulationYash Saxena, Ankur Padia, Kalpa Gunaratna, Manas Gaur. 6324-6337 [doi]
- MMUIE: Massive Multi-Domain Universal Information Extraction for Long DocumentsShuyi Zhang, Zhenbin Chen, Shuting Li, Kewei Tu, Li Jing, Zixia Jia, Zilong Zheng. 6338-6370 [doi]
- Learning to Judge: LLMs Designing and Applying Evaluation RubricsClemencia Siro, Pourya Aliannejadi, Mohammad Aliannejadi. 6371-6389 [doi]
- PsyProbe: Proactive and Interpretable Dialogue through User State Modeling for Exploratory CounselingSohhyung Park, Hyunji Kang, Sungzoon Cho, Dongil Kim. 6390-6411 [doi]
- Learning from Child-directed Speech in Two-language Scenarios: A French-English Case-StudyLiel Binyamin, Elior Sulem. 6412-6426 [doi]
- DeVisE: Towards the Behavioral Testing of Medical Large Language ModelsCamila Zurdo Tagliabue, Heloísa Oss Boll, Aykut Erdem, Erkut Erdem, Iacer Calixto. 6427-6441 [doi]
- Sequence Repetition Enhances Token Embeddings and Improves Sequence Labeling with Decoder-only Language ModelsMatija Luka Kukic, Marko Culjak, David Dukic, Martin Tutek, Jan Snajder. 6442-6456 [doi]
- MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level AssessmentOmid Ghahroodi, Arshia Hemmat, Marzia Nouri, Seyed Mohammad Hadi Hosseini, Doratossadat Dastgheib, Mohammad V. Sanian, Alireza Sahebi, Reihaneh Zohrabi, Mohammad Hossein Rohban, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah. 6457-6491 [doi]
- Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pretrained ModelsTaido Purason, Pavel Chizhov, Ivan P. Yamshchikov, Mark Fishel. 6492-6516 [doi]
- AGIC: Attention-Guided Image Captioning to Improve Caption RelevanceLekkala Sai Teja, Ashok Urlana, Pruthwik Mishra. 6517-6528 [doi]
- Visual-Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question AnsweringJieun Kim, Yujin Jeong, Sung-Bae Cho. 6529-6544 [doi]
- FactAppeal: Identifying Epistemic Factual Appeals in News MediaGuy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav. 6545-6556 [doi]
- Vietnamese Automatic Speech Recognition: A RevisitThi Vu, Linh The Nguyen, Dat Quoc Nguyen. 6557-6568 [doi]
- MapCoder-Lite: Distilling Multi-Agent Coding into a Single Small LLMWoongkyu Lee, Junhee Cho, Jungwook Choi. 6569-6596 [doi]
- When Do Language Models Endorse Limitations on Human Rights Principles?Keenan Samway, Miu Nicole Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin 0001. 6597-6623 [doi]
- Abstractive Summarization of Bengali Academic Videos Based on Audio SubtitlesLamisa Bintee Mizan Deya, Farhatun Shama, Abdul Aziz 0005, Md Kaykobad Reza, Md. Shahidul Salim. 6624-6643 [doi]
- Active Learning with Non-Uniform Costs for African Natural Language ProcessingBonaventure F. P. Dossou, Ines Arous, Audrey Durand, Jackie Chi Kit Cheung. 6644-6656 [doi]
- CrisiText: A dataset of warning messages for LLM training in emergency communicationGiacomo Gonella, Gian Maria Campedelli, Stefano Menini, Marco Guerini. 6657-6677 [doi]
- Training-Free Text Emotion Tagging via LLM-Based Best-Worst ScalingLukas Christ, Shahin Amiriparian. 6678-6694 [doi]
- Scaling Cultural Resources for Improving Generative ModelsHayk Stepanyan, Aishwarya Verma, Andrew Zaldivar, Rutledge Chin Feman, Erin MacMurray van Liemt, Charu Kalia, Vinodkumar Prabhakaran, Sunipa Dev. 6695-6709 [doi]
- Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM ReasoningSultan Alrashed, Jianghui Wang, Francesco Orabona. 6710-6724 [doi]
- ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by LaymenShounak Paul, Raghav Dogra, Pawan Goyal 0002, Saptarshi Ghosh 0001. 6725-6746 [doi]