NNG-Mix: Improving Semi-Supervised Anomaly Detection With Pseudo-Anomaly Generation


METADATA ONLY
Loading...

Date

2025-06

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on $57$ benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench.

Publication status

published

Editor

Book title

Volume

36 (6)

Pages / Article No.

10635 - 10647

Publisher

IEEE

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Anomaly detection (AD); data augmentation; mixup; nearest neighbors (NNs); semi-supervised learning

Organisational unit

03890 - Chatzi, Eleni / Chatzi, Eleni check_circle
02261 - Center for Sustainable Future Mobility / Center for Sustainable Future Mobility

Notes

Funding

Related publications and datasets