
Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In recent years, the remarkable achievements of RL have granted it a spot at the forefront of AI research. Crucially, most of these results were obtained in simulated environments, where poor actions have no harmful consequences. However, to unlock RL's full potential, we wish to deploy it in the real world. While this broadens the scope for RL's beneficial impact, it also amplifies the consequences of its harmful actions. Therefore, we must understand and address the causes that may induce RL agents to make potentially damaging decisions in real-world scenarios.
This dissertation studies unsafe behaviors that may arise in RL from inaccurate models in small-data regimes. In particular, it focuses on the problem of robustness to distributional shift, i.e., not overfitting to training data and generalizing to previously unseen environmental conditions, and safe exploration, i.e., safely acquiring data during training.
We start by introducing a model-free approximation of robustness indicators from linear control theory. We leverage our method to design control policies for a Furuta pendulum and we demonstrate their robustness in sim-to-real and hardware experiments that include a significant distributional shift.
Subsequently, we study goal-oriented safe exploration when safety can be expressed as a set of unknown smooth constraints. To address this, we propose an algorithm that enjoys safety and completeness guarantees, and we show in simulated experiments that it significantly improves over existing methods in terms of sample efficiency. We then extend it to adaptive control problems, a class of classical control problems concerned with the distributional shift induced by exogenous variables. We deploy our algorithm to control a rotational axis drive in constantly changing environments.
Finally, we present a novel framework to incorporate prior knowledge in safe exploration problems that allows us to lift many assumptions made by previous methods, e.g., smoothness. We provide safety guarantees for this framework and we combine it with deep RL agents to safely train control policies in challenging environments. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000540581Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Reinforcement learning (RL)Organisational unit
03908 - Krause, Andreas / Krause, Andreas
More
Show all metadata
ETH Bibliography
yes
Altmetrics