This report describes guidelines that can be applied for training specialized neural machine translation (NMT) systems aimed at translation in a narrow textual domain, namely the domain of social surveys, requiring a specialized MT model that is able to handle domain-specific terminology. The work presented in this report demonstrates how relatively low-resource in-domain corpora can be used to prepare these specialized models. All described models are compatible with the packaged MT framework described in Deliverable D4.5 and the best performing models are available at the Lindat repository. The code used in the training pipeline (for experiment reproduction) is available on GitHub distributed under Mozilla Public License 2.0.1
Partners also describe the full translation pipeline including file sharing and preprocessing that was used to help with automatic translation of the Covid-19 surveys into English. While the description of the pipeline is general enough to be used in other future projects, the code published by partners on GitHub serves only as an example of a task-specific solution.