Machine translation testing via pathological invariance
METADATA ONLY
Loading...
Author / Producer
Date
2020-11
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Machine translation software has become heavily integrated into our daily lives due to the recent improvement in the performance of deep neural networks. However, machine translation software has been shown to regularly return erroneous translations, which can lead to harmful consequences such as economic loss and political conflicts. Additionally, due to the complexity of the underlying neural models, testing machine translation systems presents new challenges. To address this problem, we introduce a novel methodology called PatInv. The main intuition behind PatInv is that sentences with different meanings should not have the same translation. Under this general idea, we provide two realizations of PatInv that given an arbitrary sentence, generate syntactically similar but semantically different sentences by: (1) replacing one word in the sentence using a masked language model or (2) removing one word or phrase from the sentence based on its constituency structure. We then test whether the returned translations are the same for the original and modified sentences. We have applied PatInv to test Google Translate and Bing Microsoft Translator using 200 English sentences. Two language settings are considered: English-Hindi (En-Hi) and English-Chinese (En-Zh). The results show that PatInv can accurately find 308 erroneous translations in Google Translate and 223 erroneous translations in Bing Microsoft Translator, most of which cannot be found by the state-of-the-art approaches. © 2020 ACM
Permanent link
Publication status
published
External links
Editor
Book title
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Journal / series
Volume
Pages / Article No.
863 - 875
Publisher
Association for Computing Machinery
Event
28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020) (virtual)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Testing; Machine translation; Pathological Invariance
Organisational unit
09628 - Su, Zhendong / Su, Zhendong
Notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.