Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
METADATA ONLY
Loading...
Author / Producer
Date
2024
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the choice of the distance generating function (DGF) to be differentiable with Lipschitz continuous gradients, thereby excluding important setups such as Shannon entropy. In this work, we present a new convergence analysis of nonconvex SMD supporting general DGF, that overcomes the above limitations and relies solely on the standard assumptions. Moreover, our convergence is established with respect to the Bregman Forward-Backward envelope, which is a stronger measure than the commonly used squared norm of gradient mapping. We further extend our results to guarantee high probability convergence under sub-Gaussian noise and global convergence under the generalized Bregman Proximal Polyak-{Ł}ojasiewicz condition. Additionally, we illustrate the advantages of our improved SMD theory in various nonconvex machine learning tasks by harnessing nonsmooth DGFs. Notably, in the context of nonconvex differentially private (DP) learning, our theory yields a simple algorithm with a (nearly) dimension-independent utility bound. For the problem of training linear neural networks, we develop provably convergent stochastic algorithms.
Permanent link
Publication status
published
Book title
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
Journal / series
Volume
238
Pages / Article No.
3493 - 3501
Publisher
PMLR
Event
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09729 - He, Niao / He, Niao