Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity


METADATA ONLY
Loading...

Date

2023

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i) as a dynamic game formulation for risk-sensitive or robust control, or (ii) as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. (2021) discovered an implicit regularization property of natural policy gradient methods which is crucial for safety-critical control systems since it preserves the robustness of the controller during learning. Moreover, in the model-free setting where the knowledge of model parameters is not available, Zhang et al. (2021) proposed the first polynomial sample complexity algorithm to reach an $\epsilon$-neighborhood of the Nash equilibrium while maintaining the desirable implicit regularization property. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude. Our main result guarantees a $\widetilde{O}(\epsilon^{-3})$ sample complexity under the same assumptions using a single-point ZO estimator. Furthermore, when the estimator is replaced by a two-point estimator, our method enjoys even faster convergence with a $\widetilde{O}(\epsilon^{-2})$ sample complexity. Our key improvements rely on a more sample-efficient nested algorithm design and finer control of the ZO natural gradient estimation error.

Publication status

published

Editor

Book title

2023 62nd IEEE Conference on Decision and Control (CDC)

Journal / series

Volume

Pages / Article No.

2602 - 2609

Publisher

IEEE

Event

62nd IEEE Conference on Decision and Control (CDC 2023)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09729 - He, Niao / He, Niao check_circle
02219 - ETH AI Center / ETH AI Center
02150 - Dep. Informatik / Dep. of Computer Science

Notes

Conference lecture held on December 13, 2023.

Funding

207343 - RING: Robust Intelligence with Nonconvex Games (SNF)

Related publications and datasets