Stochastic convex optimization for provably efficient apprenticeship learning
- Conference Paper
Rights / licenseIn Copyright - Non-Commercial Use Permitted
We consider large-scale Markov decision processes (MDPs) with an unknown costfunction and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear combination of some known features. Existing inverse reinforcement learning algorithms come with strong theoretical guarantees, but are computationally expensive because they use reinforcement learning or planning algorithms as a subroutine. On the other hand state-of-the-art policy gradient based algorithms (like IM-REINFORCE, IM-TRPO and GAIL), achieve significant empirical success in challenging benchmark tasks, but are less well understood in terms of theory. With an emphasis on non-asymptotic guarantees of performance, we propose a method that directly learns apolicy from expert demonstrations, bypassing the intermediate step of learning the cost function, by formulating the problem as a single convex optimization problemover occupancy measures. We develop a computationally efficient algorithmand derive high confidence excess-loss bounds on the quality of the extractedpolicy, utilizing results from uncertain convex optimization and recent works inapproximate linear programming for solving forward MDPs. Show more
Organisational unit03751 - Lygeros, John / Lygeros, John
787845 - Optimal control at large (EC)
NotesPoster presentation on December 14, 2019.
MoreShow all metadata