D4.6 Guidelines for further use of MT systems in social surveys

Submission date:

28 June 2022

This report describes guidelines that can be applied for training specialized neural machine translation (NMT) systems aimed at translation in a narrow textual domain, namely the domain of social surveys, requiring a specialized MT model that is able to handle domain-specific terminology. The work presented in this report demonstrates how relatively low-resource in-domain corpora can be used to prepare these specialized models. All described models are compatible with the packaged MT framework described in Deliverable D4.5 and the best performing models are available at the Lindat repository. The code used in the training pipeline (for experiment reproduction) is available on GitHub distributed under Mozilla Public License 2.0.1

Partners also describe the full translation pipeline including file sharing and preprocessing that was used to help with automatic translation of the Covid-19 surveys into English. While the description of the pipeline is general enough to be used in other future projects, the code published by partners on GitHub serves only as an example of a task-specific solution.

Document:

D4.6 Guidelines for further use of MT systems in social surveys

Publication type:

Deliverable

Catalogue:

Multilingual Terminologies

News

The SSH Open Marketplace Editorial Board is happy to invite you to a series of 8 hands-on workshops to strengthen FAIR and digital research skills

The SSH Open Marketplace Editorial Board is happy to invite you to a series of...

D4.6 Guidelines for further use of MT systems in social surveys

News

The SSH Open Marketplace Editorial Board is happy to invite you to a series of 8 hands-on workshops to strengthen FAIR and digital research skills

SSHOC Announces New 2026 Leadership

SSHOC 2025 Updates

Science Clusters Position statement on operational commitment to EOSC and Open Research

SSHOC, the SSH Open Science Cluster has a New Chair and Vice-Chair in 2024