FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High Efficiency
Abstract
Recently, achieving query-efficient adversarial example attacks targeting black-box natural language models has attracted widespread attention from researchers. This task is considered difficult due to the discrete nature of texts, limited knowledge of the target model, and strict query access limitations in real-world systems. However, existing attacks often require a large number of queries or result in low attack success rates, having not met practical requirements. To address this, we propose FastTextDodger, a simple and compact decision-based black-box textual adversarial attack that generates grammatically correct adversarial texts with high attack success rates and few queries. Experimental results show that FastTextDodger achieves an impressive 97.4% attack success rate on benchmark datasets and models, and only needs about 200 queries. Compared to state-of-the-art attacks, FastTextDodger only requires one-tenth of the number of queries in text classification and entailment tasks while maintaining comparable attack success rates and perturbed word rates. Show more
Publication status
publishedExternal links
Journal / series
IEEE Transactions on Information Forensics and SecurityVolume
Pages / Article No.
Publisher
IEEESubject
Adversarial attacks; black-box attacks; natural language processingMore
Show all metadata
ETH Bibliography
yes
Altmetrics