Approximate Dynamic Programming: theoretical guarantees and practical algorithms for a continuous space setting

Open access
Author
Date
2019-10Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Many problems in science and engineering can be cast as discrete time, continuous space, infinite horizon stochastic optimal control problems. The so-called value function and Q-function both characterise the solution of such problems, but are both intractable to compute in all but a few special cases. This thesis focuses on methods that approximate the value function and Q-function.
We consider the linear programming approach to approximate dynamic programming, which computes approximate value functions and Q-functions that are point-wise under-estimators of the optimal by using the so-called Bellman inequality. For this approximation method we provide theoretical guarantees on the value function and Q-function approximation error, and also for the sub-optimality of a policy generated using such lower-bounding approximations. In particular, the online performance guarantee is obtained by analysing an iterated version of the greedy policy, and the fitting error guarantee by analysing an iterated version of the Bellman inequality. These guarantees complement the existing bounds that appear in the literature.
Given a collection of lower-bounding approximate value functions, an improved approximation can be constructed by taking the point-wise maximum over the collection, however, the challenge is how to compute the collection itself. To address this challenge, we introduce a novel formulation, referred to as the point-wise maximum approach to approximate dynamic programming, and use this to propose algorithms that iteratively construct a collection of lower-bounding value functions with the objective of maximising the point-wise maximum of the collection. We empirically demonstrate the advantages of the proposed algorithm through a range numerical examples that indicate classes of problems where the proposed algorithms improves upon state-of-the-art methods. A key result from the numerical studies is that the proposed algorithm can provide practically useful sub-optimality bounds for the online performance of any policy, even when the collection of approximate value functions is itself impractical to use for a policy. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000401151Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
approximate dynamic programming; stochastic optimal control; dynamic programming; infinite dimensional linear programming; iterative methodsOrganisational unit
03751 - Lygeros, John / Lygeros, John
More
Show all metadata
ETH Bibliography
yes
Altmetrics