Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
METADATA ONLY
Loading...
Author / Producer
Date
2024-07
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is $\exists\mathbb{R}$-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Furthermore, we show that arbitrary algebraic numbers are required as weights to be able to train some instances to optimality, even if all data points are rational. Our results hold even if the following restrictions are all added simultaneously. $\bullet$ There are exactly two output neurons. $\bullet$ There are exactly two input neurons. $\bullet$ The data has only 13 different labels. $\bullet$ The number of hidden neurons is a constant fraction of the number of data points. $\bullet$ The target training error is zero. $\bullet$ The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result explains why typical methods for $\mathsf{NP}$-complete problems, like mixed-integer programming or SAT-solving, cannot train neural networks to global optimality, unless $\mathsf{NP}=\exists\mathbb{R}$. We strengthen a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
Permanent link
Publication status
published
Book title
Advances in Neural Information Processing Systems 36
Journal / series
Volume
Pages / Article No.
36222 - 36237
Publisher
Curran
Event
37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Computational Complexity (cs.CC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); FOS: Computer and information sciences
Organisational unit
03457 - Welzl, Emo (emeritus) / Welzl, Emo (emeritus)
Notes
Funding
Related publications and datasets
Is new version of: 10.48550/ARXIV.2204.01368Is new version of: https://openreview.net/forum?id=XZf2bnMBag