Learning What To Do by Simulating the Past


METADATA ONLY
Loading...

Date

2021

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.

Publication status

published

Editor

Book title

International Conference on Learning Representations (ICLR 2021)

Journal / series

Volume

Pages / Article No.

Publisher

OpenReview

Event

9th International Conference on Learning Representations (ICLR 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03908 - Krause, Andreas / Krause, Andreas check_circle

Notes

Poster presentation on May 6, 2021.

Funding

Related publications and datasets