
Open access
Date
2020-06Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000393853Publication status
publishedExternal links
Editor
Book title
Advances in Neural Information Processing Systems 32Volume
Pages / Article No.
Publisher
CurranEvent
Subject
Machine learning (artificial intelligence)Organisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas
Related publications and datasets
Is new version of: http://hdl.handle.net/20.500.11850/309290
Notes
Conference lecture held on December 12, 2019More
Show all metadata
ETH Bibliography
yes
Altmetrics