Notice

This record has been edited as far as possible, missing data will be added when the version of record is issued.

Show simple item record

dc.contributor.author
Bertschinger, Daniel
dc.contributor.author
Hertrich, Christoph
dc.contributor.author
Jungeblut, Paul
dc.contributor.author
Miltzow, Tillmann
dc.contributor.author
Weber, Simon
dc.contributor.editor
Oh, Alice
dc.contributor.editor
Neumann, Thomas
dc.contributor.editor
Globerson, Amir
dc.contributor.editor
Saenko, Kate
dc.contributor.editor
Hardt, Moritz
dc.contributor.editor
Levine, Sergey
dc.date.accessioned
2024-04-02T13:44:13Z
dc.date.available
2024-01-15T12:35:47Z
dc.date.available
2024-04-02T13:44:13Z
dc.date.issued
2023-12
dc.identifier.uri
http://hdl.handle.net/20.500.11850/652704
dc.description.abstract
We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is $\exists\mathbb{R}$-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Furthermore, we show that arbitrary algebraic numbers are required as weights to be able to train some instances to optimality, even if all data points are rational. Our results hold even if the following restrictions are all added simultaneously. $\bullet$ There are exactly two output neurons. $\bullet$ There are exactly two input neurons. $\bullet$ The data has only 13 different labels. $\bullet$ The number of hidden neurons is a constant fraction of the number of data points. $\bullet$ The target training error is zero. $\bullet$ The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result explains why typical methods for $\mathsf{NP}$-complete problems, like mixed-integer programming or SAT-solving, cannot train neural networks to global optimality, unless $\mathsf{NP}=\exists\mathbb{R}$. We strengthen a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
en_US
dc.language.iso
en
en_US
dc.publisher
Curran
en_US
dc.subject
Computational Complexity (cs.CC)
en_US
dc.subject
Machine Learning (cs.LG)
en_US
dc.subject
Neural and Evolutionary Computing (cs.NE)
en_US
dc.subject
FOS: Computer and information sciences
en_US
dc.title
Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
en_US
dc.type
Conference Paper
ethz.book.title
Advances in Neural Information Processing Systems 36
en_US
ethz.pages.start
36222
en_US
ethz.pages.end
36237
en_US
ethz.event
37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)
en_US
ethz.event.location
New Orleans, LA, USA
en_US
ethz.event.date
December 10-16, 2023
en_US
ethz.publication.place
Red Hook, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02643 - Institut für Theoretische Informatik / Inst. Theoretical Computer Science::03672 - Steger, Angelika / Steger, Angelika::03457 - Welzl, Emo (emeritus) / Welzl, Emo (emeritus)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02643 - Institut für Theoretische Informatik / Inst. Theoretical Computer Science::03672 - Steger, Angelika / Steger, Angelika::03457 - Welzl, Emo (emeritus) / Welzl, Emo (emeritus)
en_US
ethz.identifier.url
https://papers.nips.cc/paper_files/paper/2023/hash/71c31ebf577ffdad5f4a74156daad518-Abstract-Conference.html
ethz.identifier.url
https://neurips.cc/virtual/2023/poster/73560
ethz.relation.isNewVersionOf
10.48550/ARXIV.2204.01368
ethz.relation.isNewVersionOf
https://openreview.net/forum?id=XZf2bnMBag
ethz.date.deposited
2024-01-15T12:35:47Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.exportRequired
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Training%20Fully%20Connected%20Neural%20Networks%20is%20$%5Cexists%5Cmathbb%7BR%7D$-Complete&rft.date=2023-12&rft.spage=36222&rft.epage=36237&rft.au=Bertschinger,%20Daniel&Hertrich,%20Christoph&Jungeblut,%20Paul&Miltzow,%20Tillmann&Weber,%20Simon&rft.genre=proceeding&rft.btitle=Advances%20in%20Neural%20Information%20Processing%20Systems%2036
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record