Analytic Insights into Structure and Rank of Neural Network Hessian Maps
METADATA ONLY
Loading...
Author / Producer
Date
2021
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
The Hessian of a neural network captures parameter interactions through second order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, which provide us with a precise understanding of its rank deficiency and the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks — allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g. width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized neural networks.
Permanent link
Publication status
published
Book title
Advances in Neural Information Processing Systems 34
Journal / series
Volume
Pages / Article No.
23914 - 23927
Publisher
Curran
Event
35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas