Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
dc.contributor.author
Fatkhullin, Ilyas
dc.contributor.author
He, Niao
dc.contributor.editor
Dasgupta, Sanjoy
dc.contributor.editor
Mandt, Stephan
dc.contributor.editor
Li, Yingzhen
dc.date.accessioned
2024-07-08T14:41:15Z
dc.date.available
2024-01-27T09:05:54Z
dc.date.available
2024-02-27T15:46:35Z
dc.date.available
2024-07-08T14:41:15Z
dc.date.issued
2024
dc.identifier.issn
2640-3498
dc.identifier.uri
http://hdl.handle.net/20.500.11850/655730
dc.description.abstract
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the choice of the distance generating function (DGF) to be differentiable with Lipschitz continuous gradients, thereby excluding important setups such as Shannon entropy. In this work, we present a new convergence analysis of nonconvex SMD supporting general DGF, that overcomes the above limitations and relies solely on the standard assumptions. Moreover, our convergence is established with respect to the Bregman Forward-Backward envelope, which is a stronger measure than the commonly used squared norm of gradient mapping. We further extend our results to guarantee high probability convergence under sub-Gaussian noise and global convergence under the generalized Bregman Proximal Polyak-{Ł}ojasiewicz condition. Additionally, we illustrate the advantages of our improved SMD theory in various nonconvex machine learning tasks by harnessing nonsmooth DGFs. Notably, in the context of nonconvex differentially private (DP) learning, our theory yields a simple algorithm with a (nearly) dimension-independent utility bound. For the problem of training linear neural networks, we develop provably convergent stochastic algorithms.
en_US
dc.language.iso
en
en_US
dc.publisher
PMLR
en_US
dc.title
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
en_US
dc.type
Conference Paper
ethz.book.title
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
en_US
ethz.journal.title
Proceedings of Machine Learning Research
ethz.journal.volume
238
en_US
ethz.pages.start
3493
en_US
ethz.pages.end
3501
en_US
ethz.event
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)
en_US
ethz.event.location
Valencia, Spain
en_US
ethz.event.date
May 2-4, 2024
en_US
ethz.publication.place
Cambridge, MA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09729 - He, Niao / He, Niao
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09729 - He, Niao / He, Niao
en_US
ethz.identifier.url
https://proceedings.mlr.press/v238/fatkhullin24a.html
ethz.date.deposited
2024-01-27T09:05:54Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2024-07-08T14:41:16Z
ethz.rosetta.lastUpdated
2024-07-08T14:41:16Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Taming%20Nonconvex%20Stochastic%20Mirror%20Descent%20with%20General%20Bregman%20Divergence&rft.jtitle=Proceedings%20of%20Machine%20Learning%20Research&rft.date=2024&rft.volume=238&rft.spage=3493&rft.epage=3501&rft.issn=2640-3498&rft.au=Fatkhullin,%20Ilyas&He,%20Niao&rft.genre=proceeding&rft.btitle=Proceedings%20of%20The%2027th%20International%20Conference%20on%20Artificial%20Intelligence%20and%20Statistics
Files in this item
Files | Size | Format | Open in viewer |
---|---|---|---|
There are no files associated with this item. |
Publication type
-
Conference Paper [35284]