Finite-Time Analysis of Natural Actor-Critic for POMDPs
METADATA ONLY
Loading...
Author / Producer
Date
2024-12
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
We study the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large state spaces. We consider a natural actor-critic method that employs an internal memory state for policy parameterization to address partial observability, function approximation in both actor and critic to address the curse of dimensionality, and a multistep temporal difference learning algorithm for policy evaluation. We establish nonasymptotic error bounds for actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, in the specific case of sliding-window controllers, we show that this inference error can be made arbitrarily small by using larger window sizes under certain ergodicity conditions.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
6 (4)
Pages / Article No.
869 - 896
Publisher
SIAM
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
reinforcement learning; partially observable Markov decision processes; natural policy gradient; actor-critic; filter stability
Organisational unit
Notes
Funding
207343 - RING: Robust Intelligence with Nonconvex Games (SNF)