- Conference Paper
Rights / licenseIn Copyright - Non-Commercial Use Permitted
We consider the problem of learning discounted-cost optimal control policies for unknown deterministic discrete-time systems with continuous state and action spaces. We show that a policy evaluation step of the well-known policy iteration (PI) algorithm can be characterized as a solution to an infinite dimensional linear program (LP). However, when approximating such an LP with a finite dimensional program, the PI algorithm loses its nominal properties. We propose a data-driven PI scheme that ensures a certain monotonic behavior and allows for incorporation of expert knowledge on the system. A numerical example illustrates effectiveness of the proposed algorithm. Show more
Book title2019 IEEE 58th Conference on Decision and Control (CDC)
Pages / Article No.
Organisational unit03751 - Lygeros, John / Lygeros, John
787845 - Optimal control at large (EC)
NotesConference lecture on December 11, 2019.
MoreShow all metadata