WP4: Innovations in Data Production

D4.17 New version of the Aïoli platform

Archaeologists, architects, engineers, materials specialists, teachers, curators, and restorers of cultural property, contribute to the daily knowledge and conservation of heritage artefacts. For many years, the development of digital technologies has produced important results in the collection, visualisation and indexing of digital resources.

D4.18 SSHOC Reference Ontology (beta version)

This report describes the SSHOC Reference Ontology (SSHOCro), a common meta-level schema based on CIDOC CRM, to provide a semantic interoperability framework for the description of the data life cycle used by Social Science and Humanities researchers. The SSHOCro is provided in RDF/S in the file titled “D4.18 SSHOCro_v.1.0_beta.rdf”, which is submitted as an attachment to this report.

D4.19 Mapping of two indicative selected standards to the SSHOCro

This report documents the work undertaken within project Task 4.7 Modeling the SSHOC data life cycle and describes the process of mapping social science research metadata standards DDI Codebook and CMDI  to the SSHOC Reference Ontology (SSHOCro). The resulting mapping rules are also documented.

D4.2 Ready to use sample management system

This document describes the output of Task 4.1 of the SSHOC (Social Sciences and Humanities Open Cloud) project funded by the European Commission under Grant Agreement N° 823782.

A technical infrastructure called Web Panel Sample Service (WPSS) has been implemented following the specifications published in November 2919 as deliverable 4.11 "A sample management system for cross- national web survey".

D4.3 Survey specific parallel corpora

This document describes the [MCSQ]: Multilingual Corpus of Survey Questionnaires (MCSQ), a database of survey questionnaires’ texts. The report summarizes technical information about Version 1.0 (Ada Lovelace) of the MCSQ, dated in June 2020. It links to the repository to access the code and files generating the database.

D4.4 Guidelines for building survey-specific corpora

The compilation of the [MCSQ]: Multilingual Corpus of Survey Questionnaires

D4.5 Packaged tested version of MT system

This document describes the packaging and release of the CUNI machine translation (MT) systems trained within the Tensor2tensor framework for sequence-to-sequence learning that was developed for task T4.5. The MT backend, together with a simple command-line interface (CLI) is released separately from the models. The MT backend is released as a separate Docker image as a platform-independent solution.

D4.7 Code for data exchange between TMT and open-source CAT software

This document is a report accompanying the SSHOC D4.7 Code for data exchange between TMT and open- source CAT software. The team has explored possibilities for data exchange between TMT and CAT tools, specifically MateCat and MyMemory, finding three areas where such a connection would be worthwhile to develop. As a first exploration, the team has focussed on TMT and MyMemory for single segment translation suggestions, resulting in the development of a demo tool.

D4.9 Guidelines on the use of Translation Memories in survey translation

Task 4.3 in WP4 Innovations in Data Production of the SSHOC project is dedicated to Applying ComputerAssisted Translation tools in Social Surveys. A key activity of this task is to incorporate newly created Translation Memories (TMs) from a corpus, which has been developed in Task 4.2 (Preparing tools for the use of Computer Assisted Translation), into an open-source computer-assisted translation (CAT) environment. Moreover, this report lays out a test case to demonstrate the feasibility of the usage of TMs within a CAT environment.

MS19 Consultation with SSH data producers

This text concerns the achievement of the MS19 “Consultation with SSH data producers completed”. SSHOCro will be a common meta level schema to be used as top level ontology for organizing knowledge and information found distributed across various resources of data in the SSH open cloud. This will be achieved by providing a common, agreed –upon, understanding of the concepts, entities and relationships holding between them, in order to enable knowledge sharing, information exchange and integration between heterogeneous sources.