Parameter-Agnostic Optimization under Relaxed Smoothness
METADATA ONLY
Loading...
Author / Producer
Date
2024
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities, even when stepsizes are independent of problem-specific parameters, provided that the loss function is $L$-smooth. However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity. We further establish that this exponential term is inevitable to such schemes by introducing a theoretical framework of lower bounds tailored explicitly for parameter-agnostic algorithms. Interestingly, in deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search. To the best of our knowledge, these findings represent the first parameter-agnostic convergence results under the generalized smoothness condition. Our empirical experiments further confirm our theoretical insights.
Permanent link
Publication status
published
Book title
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
Journal / series
Volume
238
Pages / Article No.
4861 - 4869
Publisher
PMLR
Event
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09729 - He, Niao / He, Niao
Notes
Funding
207343 - RING: Robust Intelligence with Nonconvex Games (SNF)
Related publications and datasets
Is new version of: https://arxiv.org/pdf/2311.03252.pdf