
Open access
Author
Date
2022-09Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In optimization, one notable gap between theoretical analyses and practice is that converging algorithms in theory usually requires prior knowledge of problem-dependent parameters, which are often unavailable in practice. A prominent example is the stepsize used in gradient descent in nonconvex minimization problems. Compared to vanilla gradient descent, adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems. However, when it comes to nonconvex minimax optimization, the requirement of careful tuning of hyper-parameters or the knowledge of problem-dependent parameters can not be easily alleviated by directly combining the most commonly used minimax optimization algorithm, gradient descent ascent (GDA), with adaptive stepsizes. Such a discrepancy arises from the primal-dual nature of minimax problems and the necessity of delicate time-scale separation between the primal and dual updates in attaining convergence. In this work, we investigate the insights behind the ability of AdaGrad to adapt to parameters in minimization, and following the intuition, we propose a single-loop adaptive GDA algorithm called TiAda for nonconvex minimax optimization that automatically adapts to the time-scale separation. Our algorithm is fully parameter-agnostic and can achieve nearoptimal complexities simultaneously in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems. The effectiveness of the proposed method is further justified numerically for a number of machine learning applications. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000575744Publication status
publishedPublisher
ETH Zurich, Department of Computer ScienceOrganisational unit
09729 - He, Niao / He, Niao
More
Show all metadata
ETH Bibliography
yes
Altmetrics