Towards Near-Optimal and Adaptive Algorithms in Minimax Optimization

Yang, Junchi

doi:10.3929/ethz-b-000658589

Download

Full text (PDF, 6.240Mb)

Open access

Author

Yang, Junchi

Date

2023

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 6.240Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

In machine learning, the training process refines models by extracting patterns from vast datasets. This refinement typically hinges on an optimization formulation that minimizes a task-specific loss function over the model’s parameters. While numerous efficient first-order optimization methods exist, they may fall short in addressing contemporary machine learning tasks. For instance, Generative Adversarial Networks (GANs) employ a two-player approach: one player generates data emulating the training data distribution, while another discerns between the generated and real data. Also, Adversarial Training trains a model against worst-case scenarios, anticipating perturbations in the training data. Such two-player or worst-case tasks are typically framed as minimax optimization, where one variable seeks to minimize a loss function, and another aims to maximize it. Minimax optimization, expressed as minₓ maxᵧ (x, y), is pivotal across various domains. While the study on it can be traced back to game theory and variational inequalities, we spotlight three predominant challenges in its application to modern machine learning: asymmetry, non-convexity, and adaptivity. The initial part of this thesis addresses the asymmetry challenge. In real-world scenarios, the loss function often exhibits asymmetrical convexity/concavity properties for two variables. For instance, it might be nonconvex with respect to x but concave with respect to y. Optimal algorithms for such unbalanced minimax scenarios remain elusive, especially when the objective function adopts a finite-sum form. Current solutions for these unbalanced tasks are intricate, with distinct algorithms tailored to specific settings. We propose a universal "Catalyst" framework, drawing inspiration from proximal point methods. This approach solves a series of regularized problems using balanced-regime algorithms, achieving near-optimal or state-of-the-art complexities in unbalanced settings. The subsequent part delves into problems that are nonconvex in x and nonconcave in y simultaneously. While some studies highlight the intractability of general nonconvex-nonconcave minimax problems, we argue that discerning unique structures can pave the way for efficient algorithms. For instance, when the objective function satisfies the Polyak-Łojasiewicz inequality for both variables, we demonstrate that the Alternating Gradient Descent Ascent (AGDA) — a single-loop, prevalent algorithm — can pinpoint the global solution. If the inequality holds for just one variable, AGDA and its regularized counterpart can find stationary points. Lastly, our focus shifts to adaptive methods for nonconvex minimax optimization, aiming to obviate stepsize tuning. We observe that Gradient Descent Ascent, when paired with prevalent adaptive stepsize schemes, still fails to converge without manual tuning. This inconsistency might underpin the unstable training observed in minimax optimization, especially in GANs. We introduce a nested-loop algorithm, combined with AdaGrad, that adaptively balances updates in x and y, ensuring convergence without stepsize tuning. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000658589

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: He, Niao
Examiner: Hofmann, Thomas
Examiner: Kiyavash, Negar
Examiner: Razaviyayn, Meisam

Publisher

ETH Zurich

Subject

machine learning; Optimization algorithms; computer science

Organisational unit

09729 - He, Niao / He, Niao

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Towards Near-Optimal and Adaptive Algorithms in Minimax Optimization Mendeley CSV RIS BibTeX

Towards Near-Optimal and Adaptive Algorithms in Minimax Optimization

Mendeley

CSV

RIS

BibTeX