Future memories are not needed for large classes of POMDPs
OPEN ACCESS
Author / Producer
Date
2023-05
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Optimal policies for partially observed Markov decision processes (POMDPs) are history-dependent: Decisions are made based on the entire history of observations. Memoryless policies, which take decisions based on the last observation only, are generally considered useless in the literature because we can construct POMDP instances for which optimal memoryless policies are arbitrarily worse than history-dependent ones. Our purpose is to challenge this belief. We show that optimal memoryless policies can be computed efficiently using mixed integer linear programming (MILP), and perform reasonably well on a wide range of instances from the literature. When strengthened with valid inequalities, the linear relaxation of this MILP provides high quality upper-bounds on the value of an optimal history dependent policy. Furthermore, when used with a finite horizon POMDP problem with memoryless policies as rolling optimization problem, a model predictive control approach leads to an efficient history-dependent policy, which we call the short memory in the future (SMF) policy. Basically, the SMF policy leverages these memoryless policies to build an approximation of the Bellman value function. Numerical experiments show the efficiency of our approach on benchmark instances from the literature.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
51 (3)
Pages / Article No.
270 - 277
Publisher
Elsevier
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Partially observable Markov decision process; Deterministic policies; Mathematical programming
Organisational unit
02286 - Swiss Data Science Center (SDSC) / Swiss Data Science Center (SDSC)