Stochastic convex optimization for provably efficient apprenticeship learning

Open access
Date
2019-12-14Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
We consider large-scale Markov decision processes (MDPs) with an unknown costfunction and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear combination of some known features. Existing inverse reinforcement learning algorithms come with strong theoretical guarantees, but are computationally expensive because they use reinforcement learning or planning algorithms as a subroutine. On the other hand state-of-the-art policy gradient based algorithms (like IM-REINFORCE, IM-TRPO and GAIL), achieve significant empirical success in challenging benchmark tasks, but are less well understood in terms of theory. With an emphasis on non-asymptotic guarantees of performance, we propose a method that directly learns apolicy from expert demonstrations, bypassing the intermediate step of learning the cost function, by formulating the problem as a single convex optimization problemover occupancy measures. We develop a computationally efficient algorithmand derive high confidence excess-loss bounds on the quality of the extractedpolicy, utilizing results from uncertain convex optimization and recent works inapproximate linear programming for solving forward MDPs. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000493336Publication status
publishedExternal links
Publisher
OPTRL 2019Event
Organisational unit
03751 - Lygeros, John / Lygeros, John
Funding
787845 - Optimal control at large (EC)
Notes
Poster presentation on December 14, 2019.More
Show all metadata
ETH Bibliography
yes
Altmetrics