Abstract is missing.
- Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed BanditsWoojin Jeong, Seungki Min. 16-28
- Graph Neural Thompson SamplingShuang Wu, Arash A. Amini. 29-63
- JoinGym: An Efficient Join Order Selection EnvironmentJunxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun 0002. 64-91
- An Open-Loop Baseline for Reinforcement Learning Locomotion TasksAntonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schäffer, João Silvério, Freek Stulp. 92-107
- Online Planning in POMDPs with State-RequestsRaphaël Avalos, Eugenio Bargiacchi, Ann Nowé, Diederik M. Roijers, Frans A. Oliehoek. 108-129
- A Recipe for Unbounded Data Augmentation in Visual Reinforcement LearningAbdulaziz Almuzairee, Nicklas Hansen 0001, Henrik I. Christensen. 130-157
- BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned ApproximationsRobert J. Moss, Anthony Corso 0001, Jef Caers, Mykel J. Kochenderfer. 158-181
- Non-adaptive Online Finetuning for Offline Reinforcement LearningAudrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik. 182-197
- Guided Data Augmentation for Offline Reinforcement Learning and Imitation LearningNicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna. 198-215
- Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPsMichael Lu, Matin Aghaei, Anant Raj, Sharan Vaswani. 216-282
- Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy SetsBenjamin Freed, Thomas Wei, Roberto Calandra, Jeff Schneider 0001, Howie Choset. 283-301
- The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function ApproximationNoah Golowich, Ankur Moitra. 302-341
- Learning Action-based Representations Using InvarianceMax Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang 0001. 342-365
- Cyclicity-Regularized Coordination GraphsOliver Järnefelt, Mahdi Kallel, Carlo D'Eramo. 366-379
- Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy OptimizationAditya Kapoor, Benjamin Freed, Jeff Schneider 0001, Howie Choset. 380-399
- OCAtari: Object-Centric Atari 2600 Reinforcement Learning EnvironmentsQuentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting. 400-449
- SplAgger: Split Aggregation for Meta-Reinforcement LearningJacob Beck, Matthew Thomas Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson. 450-469
- A Tighter Convergence Proof of Reverse Experience ReplayNan Jiang 0012, Jinzhao Li, Yexiang Xue. 470-480
- Learning to Optimize for Reinforcement LearningQingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu. 481-497
- Multi-view Disentanglement for Reinforcement Learning with Multiple CamerasMhairi Dunion, Stefano V. Albrecht. 498-515
- Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement LearningTrevor McInroe, Adam Jelley, Stefano V. Albrecht, Amos J. Storkey. 516-546
- Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement LearningAdriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth. 547-562
- Mitigating the Curse of Horizon in Monte-Carlo ReturnsAlex Ayoub, David Szepesvari, Francesco Zanini, Bryan Chan, Dhawal Gupta, Bruno Castro da Silva, Dale Schuurmans. 563-572
- A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR OptimizationYudong Luo, Yangchen Pan, Han Wang, Philip Torr 0001, Pascal Poupart. 573-592
- ROIL: Robust Offline Imitation Learning without TrajectoriesGersi Doko, Guang Yang, Daniel S. Brown, Marek Petrik. 593-605
- Harnessing Discrete Representations for Continual Reinforcement LearningEdan Meyer, Adam White 0001, Marlos C. Machado. 606-628
- Three Dogmas of Reinforcement LearningDavid Abel, Mark K. Ho, Anna Harutyunyan. 629-644
- Policy Gradient with Active Importance SamplingMatteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli. 645-675
- The Limits of Pure Exploration in POMDPs: When the Observation Entropy is EnoughRiccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti. 676-692
- Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement LearningZakariae El Asri, Olivier Sigaud, Nicolas Thome. 693-713
- Trust-based Consensus in Multi-Agent Reinforcement Learning SystemsHo Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi. 714-732
- Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive PoliciesYu Luo, Fuchun Sun 0001, Tianying Ji, Xianyuan Zhan. 733-762
- Informed POMDP: Leveraging Additional Information in Model-Based RLGaspard Lambrechts, Adrien Bolland, Damien Ernst. 763-784
- An Optimal Tightness Bound for the Simulation LemmaSam Lobel, Ronald Parr. 785-797
- Best Response ShapingMilad Aghajohari, Tim Cooijmans, Juan Agustin Duque, Shunichi Akatsuka, Aaron C. Courville. 798-818
- A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level LearningGianluca Drappo, Alberto Maria Metelli, Marcello Restelli. 819-839
- SwiftTD: A Fast and Robust Algorithm for Temporal Difference LearningKhurram Javed, Arsalan Sharifnassab, Richard S. Sutton. 840-863
- The Cliff of Overcommitment with Policy Gradient Step SizesScott M. Jordan, Samuel Neumann, James E. Kostas, Adam White 0001, Philip S. Thomas. 864-883
- Multistep Inverse Is Not All You NeedAlexander Levine 0001, Peter Stone, Amy Zhang 0001. 884-925
- Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control PriorsEmma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe. 926-945
- Sequential Decision-Making for Inline Text AutocompleteRohan Chitnis, Shentao Yang, Alborz Geramifard. 946-960
- Exploring Uncertainty in Distributional Reinforcement LearningGeorgy Antonov, Peter Dayan. 961-978
- Robotic Manipulation Datasets for Offline Compositional Reinforcement LearningMarcel Hussing, Jorge A. Mendez, Anisha Singrodia, Cassandra Kent, Eric Eaton. 979-994
- Dissecting Deep RL with High Update Ratios: Combatting Value DivergenceMarcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir Massoud Farahmand, Eric Eaton. 995-1018
- Demystifying the Recency Heuristic in Temporal-Difference LearningBrett Daley, Marlos C. Machado, Martha White. 1019-1036
- On the consistency of hyper-parameter selection in value-based deep reinforcement learningJohan Samir Obando-Ceron, João Guilherme Madeira Araújo, Aaron C. Courville, Pablo Samuel Castro. 1037-1059
- Value Internalization: Learning and Generalizing from Social RewardFrieda Rong, Max Kleiman-Weiner. 1060-1071
- Mixture of Experts in a Mixture of RL settingsTimon Willi, Johan Samir Obando-Ceron, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Pablo Samuel Castro. 1072-1105
- Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningDavide Corsi, Davide Camponogara, Alessandro Farinelli. 1106-1123
- On Welfare-Centric Fair Reinforcement LearningCyrus Cousins, Kavosh Asadi, Elita Lobo, Michael Littman 0002. 1124-1137
- Inverse Reinforcement Learning with Multiple Planning HorizonsJiayu Yao, Weiwei Pan, Finale Doshi-Velez, Barbara E. Engelhardt. 1138-1167
- Constant Stepsize Q-learning: Distributional Convergence, Bias and ExtrapolationYixuan Zhang, Qiaomin Xie. 1168-1210
- More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingHaque Ishfaq, Yixin Tan, Yu Yang 0001, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu 0002. 1211-1235
- Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent AnalysisQining Zhang, Honghao Wei, Lei Ying 0001. 1236-1251
- A Natural Extension To Online Algorithms For Hybrid RL With Limited CoverageKevin Tan, Ziping Xu. 1252-1264
- Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired BehaviorZhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael Littman 0002. 1265-1288
- Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning ApproachBin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu 0022. 1289-1305
- An Idiosyncrasy of Time-discretization in Reinforcement LearningKris De Asis, Richard S. Sutton. 1306-1316
- Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot GeneralizationSai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp. 1317-1350
- Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision ProcessesTetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang. 1351-1376
- Offline Diversity Maximization under Imitation ConstraintsMarin Vlastelica, Jin Cheng 0002, Georg Martius, Pavel Kolev. 1377-1409
- Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global WorkspaceLéopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen. 1410-1426
- Stabilizing Extreme Q-learning by Maclaurin ExpansionMotoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada. 1427-1440
- Combining Automated Optimisation of Hyperparameters and Reward ShapeJulian Dierkes, Emma Cramer, Holger H. Hoos, Sebastian Trimpe. 1441-1466
- Sample Complexity of Offline Distributionally Robust Linear Markov Decision ProcessesHe Wang, Laixi Shi, Yuejie Chi. 1467-1510
- PASTA: Pretrained Action-State Transformer AgentsRaphaël Boige, Yannis Flet-Berliac, Lars C. P. M. Quaedvlieg, Arthur Flajolet, Guillaume Richard, Thomas Pierrot. 1511-1532
- Cost Aware Best Arm IdentificationKellen Kanarios, Qining Zhang, Lei Ying 0001. 1533-1545
- ICU-Sepsis: A Benchmark MDP Built from Real Medical DataKartik Choudhary, Dhawal Gupta, Philip S. Thomas. 1546-1566
- When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement LearningClaas Voelcker, Tyler Kastner, Igor Gilitschenski, Amir Massoud Farahmand. 1567-1597
- ROER: Regularized Optimal Experience ReplayChangling Li, Zhang-Wei Hong, Pulkit Agrawal 0001, Divyansh Garg, Joni Pajarinen. 1598-1618
- Combining Reconstruction and Contrastive Methods for Multimodal Representations in RLPhilipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann. 1619-1655
- RL for Consistency Models: Reward Guided Text-to-Image Generation with Fast InferenceOwen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun 0002. 1656-1673
- A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran TurismoMiguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, Peter Stone. 1674-1710
- Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RLMiguel Suau, Matthijs T. J. Spaan, Frans A. Oliehoek. 1711-1732
- Learning Abstract World Models for Value-preserving Planning with OptionsRafael Rodríguez-Sánchez 0002, George Konidaris 0001. 1733-1758
- Verification-Guided Shielding for Deep Reinforcement LearningDavide Corsi, Guy Amir, Andoni Rodríguez, Guy Katz, César Sánchez 0001, Roy Fox. 1759-1780
- Learning Discrete World Models for Heuristic SearchForest Agostinelli, Misagh Soltani. 1781-1792
- Distributionally Robust Constrained Reinforcement Learning under Strong DualityZhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue. 1793-1821
- Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality DemonstrationsConnor Mattson, Anurag Aribandi, Daniel S. Brown. 1822-1840
- Revisiting Sparse Rewards for Goal-Reaching Reinforcement LearningGautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood. 1841-1854
- Policy-Guided DiffusionMatthew Thomas Jackson, Michael T. Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Nicolaus Foerster. 1855-1872
- Agent-Centric Human Demonstrations Train World ModelsJames Staley, Elaine Short, Shivam Goel, Yash Shukla. 1873-1886
- Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?Akansha Kalra, Daniel S. Brown. 1887-1910
- Imitation Learning from Observation through Optimal TransportWei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek. 1911-1923
- Light-weight Probing of Unsupervised Representations for Reinforcement LearningWancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, Nicolas Carion. 1924-1949
- Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement LearningYuxin Chen, Chen Tang, Thomas Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan. 1950-1964
- Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent EnvironmentsDaniel Melcer, Christopher Amato, Stavros Tripakis. 1965-1994
- Reward CenteringAbhishek Naik, Yi Wan 0004, Manan Tomar, Richard S. Sutton. 1995-2016
- MultiHyRL: Robust Hybrid RL for Obstacle Avoidance against Adversarial Attacks on the Observation SpaceJan de Priester, Zachary I. Bell, Prashant Ganesh, Ricardo G. Sanfelice. 2017-2040
- Investigating the Interplay of Prioritized Replay and GeneralizationParham Mohammad Panahi, Andrew Patterson, Martha White, Adam White 0001. 2041-2058
- Towards General Negotiation Strategies with End-to-End Reinforcement LearningBram M. Renting, Thomas M. Moerland, Holger H. Hoos, Catholijn M. Jonker. 2059-2070
- PID Accelerated Temporal Difference AlgorithmsMark Bedaywi, Amin Rakhsha, Amir Massoud Farahmand. 2071-2095
- States as goal-directed concepts: an epistemic approach to state-representation learningNadav Amir, Yael Niv, Angela Langdon. 2096-2106
- Posterior Sampling for Continuing EnvironmentsWanqiao Xu, Shi Dong 0003, Benjamin Van Roy. 2107-2122
- Reinforcement Learning from Delayed Observations via World ModelsArmin Karamzade, Kyungmin Kim 0002, Montek Kalsi, Roy Fox. 2123-2139
- Offline Reinforcement Learning from Datasets with Structured Non-StationarityJohannes Ackermann, Takayuki Osa, Masashi Sugiyama. 2140-2161
- Resource Usage Evaluation of Discrete Model-Free Deep Reinforcement Learning AlgorithmsOlivia P. Dizon-Paradis, Stephen E. Wormald, Daniel E. Capecci, Avanti Bhandarkar, Damon L. Woodard. 2162-2177
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement LearningRafael Rafailov, Kyle Beltran Hatch, Anikait Singh, Aviral Kumar, Laura M. Smith, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip J. Ball, Jiajun Wu 0001, Sergey Levine, Chelsea Finn. 2178-2197
- Weight Clipping for Deep Continual and Reinforcement LearningMohamed Elsayed 0003, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood. 2198-2217
- A Batch Sequential Halving Algorithm without Performance DegradationSotetsu Koyamada, Soichiro Nishimori, Shin Ishii. 2218-2232
- Causal Contextual Bandits with Adaptive ContextRahul Madhavan, Aurghya Maiti, Gaurav Sinha 0001, Siddharth Barman. 2233-2263
- Policy Architectures for Compositional Generalization in ControlAllan Zhou, Vikash Kumar, Chelsea Finn, Aravind Rajeswaran. 2264-2283
- Semi-Supervised One Shot Imitation LearningPhilipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel. 2284-2297
- Cross-environment Hyperparameter Tuning for Reinforcement LearningAndrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White 0001. 2298-2319
- Human-compatible driving agents through data-regularized self-play reinforcement learningDaphne Cornelisse, Eugene Vinitsky. 2320-2344
- Inception: Efficiently Computable Misinformation Attacks on Markov GamesJeremy McMahan, Young Wu, Yudong Chen 0001, Jerry Zhu, Qiaomin Xie. 2345-2358
- Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down MapsLinfeng Zhao, Lawson L. S. Wong. 2359-2372
- Boosting Soft Q-Learning by BoundingJacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni. 2373-2399
- Bandits with Multimodal StructureHassan Saber, Odalric-Ambrym Maillard. 2400-2439
- Bounding-Box Inference for Error-Aware Model-Based Reinforcement LearningErin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang. 2440-2460
- Non-stationary Bandits and Meta-Learning with a Small Set of Optimal ArmsMohammad Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György 0001, Claire Vernade, Mohammad Ghavamzadeh. 2461-2491
- Optimizing Rewards while meeting $\omega$-regular ConstraintsChristopher K. Zeitler, Kristina Miller, Sayan Mitra, John Schierman, Mahesh Viswanathan 0001. 2492-2514