Finite-Time Analysis of Natural Actor-Critic for POMDPs


METADATA ONLY
Loading...

Date

2024-12

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

We study the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large state spaces. We consider a natural actor-critic method that employs an internal memory state for policy parameterization to address partial observability, function approximation in both actor and critic to address the curse of dimensionality, and a multistep temporal difference learning algorithm for policy evaluation. We establish nonasymptotic error bounds for actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, in the specific case of sliding-window controllers, we show that this inference error can be made arbitrarily small by using larger window sizes under certain ergodicity conditions.

Publication status

published

Editor

Book title

Volume

6 (4)

Pages / Article No.

869 - 896

Publisher

SIAM

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

reinforcement learning; partially observable Markov decision processes; natural policy gradient; actor-critic; filter stability

Organisational unit

Notes

Funding

207343 - RING: Robust Intelligence with Nonconvex Games (SNF)

Related publications and datasets