NNG-Mix: Improving Semi-Supervised Anomaly Detection With Pseudo-Anomaly Generation
METADATA ONLY
Loading...
Author / Producer
Date
2025-06
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on $57$ benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench.
Permanent link
Publication status
published
External links
Editor
Book title
Volume
36 (6)
Pages / Article No.
10635 - 10647
Publisher
IEEE
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Anomaly detection (AD); data augmentation; mixup; nearest neighbors (NNs); semi-supervised learning
Organisational unit
03890 - Chatzi, Eleni / Chatzi, Eleni
02261 - Center for Sustainable Future Mobility / Center for Sustainable Future Mobility