Q-learning Helps Offline In-Context RL


Loading...

Author / Producer

Date

2025-03-25

Publication Type

Master Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

In this work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. None of the existing offline ICRL approaches optimizes RL objective while it’s usage is expected to benefit resulting agents performance. Through experiments across more than 150 datasets derived from GridWorld-based and MuJoCo environments, we demonstrate that optimizing RL objectives improves performance by approximately 30% on average compared to the widely established Transformer-based Algorithm Distillation (AD) baseline across various dataset coverages, structures, expertise levels, and environmental complexities. Moreover, RL-based approaches demonstrate twice better performance than AD when tested on a challenging XLand-MiniGrid environment with a tiny fraction of the available dataset. Our results also reveal that offline RL-based methods outperform online RL approaches in this setup which is not trivial finding due to the need to adapt to out-of-distribution tasks. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and demonstrates that offline RL is a promising direction for applying in ICRL settings. Our findings demonstrate enough evidence that the future methods in offline ICRL should explicitly optimize RL objectives.

Publication status

published

External links

Editor

Contributors

Examiner : Kurenkov, Vladislav
Examiner : Nikulin, Alexander
Examiner : Sachan, Mrinmaya

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09684 - Sachan, Mrinmaya / Sachan, Mrinmaya check_circle

Notes

Funding

Related publications and datasets