Attentive Learning Facilitates Generalization of Neural Networks


METADATA ONLY

Date

2025-02

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

This article studies the generalization of neural networks (NNs) by examining how a network changes when trained on a training sample with or without out-of-distribution (OoD) examples. If the network's predictions are less influenced by fitting OoD examples, then the network learns attentively from the clean training set. A new notion, dataset-distraction stability, is proposed to measure the influence. Extensive CIFAR-10/100 experiments on the different VGG, ResNet, WideResNet, ViT architectures, and optimizers show a negative correlation between the dataset-distraction stability and generalizability. With the distraction stability, we decompose the learning process on the training set $\mathcal{S}$ into multiple learning processes on the subsets of $\mathcal{S}$ drawn from simpler distributions, i.e., distributions of smaller intrinsic dimensions (IDs), and furthermore, a tighter generalization bound is derived. Through attentive learning, miraculous generalization in deep learning can be explained and novel algorithms can also be designed.

Publication status

published

Editor

Book title

Volume

36 (2)

Pages / Article No.

3329 - 3342

Publisher

IEEE

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Deep learning generalization; explainable artificial intelligence (AI); learning mechanism; neural networks (NNs)

Organisational unit

Notes

Funding

Related publications and datasets