AI-Assisted Human Evaluation of Machine Translation
OPEN ACCESS
Loading...
Author / Producer
Date
2024-09-17
Publication Type
Working Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires a lot of expert human labor. The recently adopted annotation protocol, Error Span Annotation (ESA), has annotators marking erroneous parts of the translation and then assigning a final score. A lot of the annotator time is spent on scanning the translation for possible errors. In our work, we help the annotators by pre-filling the error annotations with recall-oriented automatic quality estimation. With this AI assistance, we obtain annotations at the same quality level while cutting down the time per span annotation by half (71s/error span $\rightarrow$ 31s/error span). The biggest advantage of ESA$^\mathrm{AI}$ protocol is an accurate priming of annotators (pre-filled error spans) before they assign the final score. This also alleviates a potential automation bias, which we confirm to be low. In addition, the annotation budget can be reduced by almost 25\% with filtering of examples that the AI deems to be very likely to be correct.
Permanent link
Publication status
published
Editor
Book title
Journal / series
Volume
Pages / Article No.
2406.12419
Publisher
Cornell University
Event
Edition / version
v2
Methods
Software
Geographic location
Date collected
Date created
Subject
Computation and Language (cs.CL); FOS: Computer and information sciences
Organisational unit
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya