Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions


METADATA ONLY
Loading...

Date

2022

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\le\alpha\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $\epsilon$-global optimum is $\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for $1\le\alpha< 3/2$ and $\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for $3/2\le\alpha\le 2$. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to ${\mathcal{O}}(\epsilon^{-2})$ for $\alpha=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods.

Publication status

published

Book title

Advances in Neural Information Processing Systems 35

Journal / series

Volume

Pages / Article No.

10862 - 10875

Publisher

Curran

Event

36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09729 - He, Niao / He, Niao check_circle

Notes

Poster presentation on November 29, 2022.

Funding

180545 - NCCR Automation (phase I) (SNF)

Related publications and datasets