Open access
Author
Date
2023Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In machine learning, the training process refines models by extracting patterns from vast datasets. This refinement typically hinges on an optimization formulation that minimizes a task-specific loss function over the model’s parameters. While numerous efficient first-order optimization methods exist, they may fall short in addressing contemporary machine learning tasks. For instance, Generative Adversarial Networks (GANs) employ a two-player approach: one player generates data emulating the training data distribution, while another discerns between the generated and real data. Also, Adversarial Training trains a model against worst-case scenarios, anticipating perturbations in the training data. Such two-player or worst-case tasks are typically framed as minimax optimization, where one variable seeks to minimize a loss function, and another aims to maximize it.
Minimax optimization, expressed as minₓ maxᵧ (x, y), is pivotal across various domains. While the study on it can be traced back to game theory and variational inequalities, we spotlight three predominant challenges in its application to modern machine learning: asymmetry, non-convexity, and adaptivity.
The initial part of this thesis addresses the asymmetry challenge. In real-world scenarios, the loss function often exhibits asymmetrical convexity/concavity properties for two variables. For instance, it might be nonconvex with respect to x but concave with respect to y. Optimal algorithms for such unbalanced minimax scenarios remain elusive, especially when the objective function adopts a finite-sum form. Current solutions for these unbalanced tasks are intricate, with distinct algorithms tailored to specific settings. We propose a universal "Catalyst" framework, drawing inspiration from proximal point methods. This approach solves a series of regularized problems using balanced-regime algorithms, achieving near-optimal or state-of-the-art complexities in unbalanced settings.
The subsequent part delves into problems that are nonconvex in x and nonconcave in y simultaneously. While some studies highlight the intractability of general nonconvex-nonconcave minimax problems, we argue that discerning unique structures can pave the way for efficient algorithms. For instance, when the objective function satisfies the Polyak-Łojasiewicz inequality for both variables, we demonstrate that the Alternating Gradient Descent Ascent (AGDA) — a single-loop, prevalent algorithm — can pinpoint the global solution. If the inequality holds for just one variable, AGDA and its regularized counterpart can find stationary points.
Lastly, our focus shifts to adaptive methods for nonconvex minimax optimization, aiming to obviate stepsize tuning. We observe that Gradient Descent Ascent, when paired with prevalent adaptive stepsize schemes, still fails to converge without manual tuning. This inconsistency might underpin the unstable training observed in minimax optimization, especially in GANs. We introduce a nested-loop algorithm, combined with AdaGrad, that adaptively balances updates in x and y, ensuring convergence without stepsize tuning. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000658589Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: He, Niao
Examiner: Hofmann, Thomas
Examiner: Kiyavash, Negar
Examiner: Razaviyayn, Meisam
Publisher
ETH ZurichSubject
machine learning; Optimization algorithms; computer scienceOrganisational unit
09729 - He, Niao / He, Niao
More
Show all metadata
ETH Bibliography
yes
Altmetrics