Q-learning Helps Offline In-Context RL
OPEN ACCESS
Loading...
Author / Producer
Date
2025-03-25
Publication Type
Master Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
In this work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. None of the existing offline ICRL approaches optimizes RL objective while it’s usage is expected to benefit resulting agents performance. Through experiments across more than 150 datasets derived from GridWorld-based and MuJoCo environments, we demonstrate that optimizing RL objectives improves performance by approximately 30% on average compared to the widely established Transformer-based Algorithm Distillation (AD) baseline across various dataset coverages, structures, expertise levels, and environmental complexities. Moreover, RL-based approaches demonstrate twice better performance than AD when tested on a challenging XLand-MiniGrid environment with a tiny fraction of the available dataset. Our results also reveal that offline RL-based methods outperform online RL approaches in this setup which is not trivial finding due to the need to adapt to out-of-distribution tasks. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and demonstrates that offline RL is a promising direction for applying in ICRL settings. Our findings demonstrate enough evidence that the future methods in offline ICRL should explicitly optimize RL objectives.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Kurenkov, Vladislav
Examiner : Nikulin, Alexander
Examiner : Sachan, Mrinmaya
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya