Learning What To Do by Simulating the Past
METADATA ONLY
Loading...
Author / Producer
Date
2021
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
Permanent link
Publication status
published
External links
Editor
Book title
International Conference on Learning Representations (ICLR 2021)
Journal / series
Volume
Pages / Article No.
Publisher
OpenReview
Event
9th International Conference on Learning Representations (ICLR 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03908 - Krause, Andreas / Krause, Andreas
Notes
Poster presentation on May 6, 2021.