Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning


METADATA ONLY
Loading...

Date

2021

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

In real-world tasks, reinforcement learning (RL) agents frequently encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. The robust RL framework addresses this challenge via a worst-case optimization between an agent and an adversary. Previous robust RL algorithms are either sample inefficient, lack robustness guarantees, or do not scale to large problems. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem while attaining near-optimal sample complexity guarantees. RH-UCRL is a model-based reinforcement learning (MBRL) algorithm that effectively distinguishes between epistemic and aleatoric uncertainty, and efficiently explores both the agent and adversary decision spaces during policy learning. We scale RH-UCRL to complex tasks via neural networks ensemble models as well as neural network policies. Experimentally, we demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a variety of adversarial environments.

Publication status

published

Book title

Proceedings of the 38th International Conference on Machine Learning

Volume

139

Pages / Article No.

2254 - 2264

Publisher

PMLR

Event

38th International Conference on Machine Learning (ICML 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03908 - Krause, Andreas / Krause, Andreas check_circle

Notes

Funding

815943 - Reliable Data-Driven Decision Making in Cyber-Physical Systems (EC)
19-2 FEL-47 - Robust Sample-Efficient Learning when Data ist Costly (ETHZ)
180545 - NCCR Automation (phase I) (SNF)

Related publications and datasets