Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution


METADATA ONLY
Loading...

Date

2022

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Recently Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.

Publication status

published

Book title

Advances in Neural Information Processing Systems 35

Journal / series

Volume

Pages / Article No.

26943 - 26954

Publisher

Curran

Event

36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09462 - Hofmann, Thomas / Hofmann, Thomas check_circle

Notes

Poster presentation on November 29, 2022.

Funding

Related publications and datasets