WP4: Innovations in Data Production

D4.2 Ready to use sample management system

This document describes the output of Task 4.1 of the SSHOC (Social Sciences and Humanities Open Cloud) project funded by the European Commission under Grant Agreement N° 823782.

A technical infrastructure called Web Panel Sample Service (WPSS) has been implemented following the specifications published in November 2919 as deliverable 4.11 "A sample management system for cross- national web survey".

D4.3 Survey specific parallel corpora

This document describes the [MCSQ]: Multilingual Corpus of Survey Questionnaires (MCSQ), a database of survey questionnaires’ texts. The report summarizes technical information about Version 1.0 (Ada Lovelace) of the MCSQ, dated in June 2020. It links to the repository to access the code and files generating the database.

D4.4 Guidelines for building survey-specific corpora

D4.5 Packaged tested version of MT system

This document describes the packaging and release of the CUNI machine translation (MT) systems trained within the Tensor2tensor framework for sequence-to-sequence learning that was developed for task T4.5. The MT backend, together with a simple command-line interface (CLI) is released separately from the models. The MT backend is released as a separate Docker image as a platform-independent solution.

D4.6 Guidelines for further use of MT systems in social surveys

This report describes guidelines that can be applied for training specialized neural machine translation (NMT) systems aimed at translation in a narrow textual domain, namely the domain of social surveys, requiring a specialized MT model that is able to handle domain-specific terminology. The work presented in this report demonstrates how relatively low-resource in-domain corpora can be used to prepare these specialized models.

D4.7 Code for data exchange between TMT and open-source CAT software

This document is a report accompanying the SSHOC D4.7 Code for data exchange between TMT and open- source CAT software. The team has explored possibilities for data exchange between TMT and CAT tools, specifically MateCat and MyMemory, finding three areas where such a connection would be worthwhile to develop. As a first exploration, the team has focussed on TMT and MyMemory for single segment translation suggestions, resulting in the development of a demo tool.

D4.9 Guidelines on the use of Translation Memories in survey translation

Task 4.3 in WP4 Innovations in Data Production of the SSHOC project is dedicated to Applying ComputerAssisted Translation tools in Social Surveys. A key activity of this task is to incorporate newly created Translation Memories (TMs) from a corpus, which has been developed in Task 4.2 (Preparing tools for the use of Computer Assisted Translation), into an open-source computer-assisted translation (CAT) environment. Moreover, this report lays out a test case to demonstrate the feasibility of the usage of TMs within a CAT environment.

MS19 Consultation with SSH data producers

This text concerns the achievement of the MS19 “Consultation with SSH data producers completed”. SSHOCro will be a common meta level schema to be used as top level ontology for organizing knowledge and information found distributed across various resources of data in the SSH open cloud. This will be achieved by providing a common, agreed –upon, understanding of the concepts, entities and relationships holding between them, in order to enable knowledge sharing, information exchange and integration between heterogeneous sources.

D4.8 Report on possibilities for incorporating open source CAT tool functionality into the TMT

This report reviews potential mechanisms to integrate a Translation Memory (TM) solution into the Translation Management Tool (TMT), allowing large international surveys to improve their translation processes and deliver quicker and better quality translations. This will include further research into integration with other external CAT tools and the development of a stand-alone Translation Memory tool to which the TMT will be connected, followed by an evaluation and implementation of various TM matching algorithms and sharing of TMT data with partners via the TM solution.


MS17 Open source CAT TM software selected

This report documents the selection criteria of an open source Computer Assisted Translation tool with Translation Memory functionalities that will be used in the translation research activities of Task 4.3. of the SSHOC project. The TAsk team describes the role of the milestone in the Task and the means of verification.