Service Catalogue | SSHOPENCLOUD

Between January 2019 and April 2022 SSHOC delivers a series of services and tools for daily use by SSH researchers. All tools will be made available via the SSH Open Marketplace by April 2022.

Scroll down to browse the catalogue.

Dataset: investigating the effects of machine translation and post-editing in survey translation

Dataset of two experiments in the language pairs English-Russian, English-German. The dataset contains linguistic data (Segments translated in different methods) and human data (participants filled in questionnaires). The dataset contains translation versions across different steps of a team translation approach.

Type:

Dataset

Property:

Datasets

SSHOC Resources:

D8.1 Governance and Sustainability Roadmap

Improved and FAIR data Repositories - SHARE survey data

Technological development continues to offer ways of health data collection that go beyond asking survey questions. Such new types of data were collected in the SHARE survey by means of Dried Blood Spot Samples (DBSS) and accelerometer data. These data demand new data protection rules and must often be extensively processed, validated and calibrated before they can be made accessible. SHARE and CentERdata develop a strategy to provide access to such data according to FAIR principles.

Type:

Dataset

Property:

Datasets

SSHOC Resources:

D8.1 Governance and Sustainability Roadmap D5.4 User friendly data release, including user guide, of one biomedical data set linked to survey data and including metadata (Access to biomedical data)

Multilingual Terminologies

Many different domains lack multilingual terminological resources. Making data and services accessible and usable in SSH is very much a matter of providing terminology across languages and multilingual vocabularies. Shortage of multilingual terminologies and vocabularies represents an obstacle to the access and reuse of information. Using the appropriate vocabularies can greatly improve both discovery and classification. Consequently, for SSHOC, it is important to address this issue with respect to the SSH domain.

Type:

Dataset

Property:

Datasets

Accessible at:

SSHOC Vocabularies access

Data are deposited in the ILC4CLARIN centre

SSHOC Resources:

D8.1 Governance and Sustainability Roadmap D4.6 Guidelines for further use of MT systems in social surveys

surveycodings.org

Surveycodings offers a host of social science codings measuring individual and socio-economic variables. The developed tools consist of a multilingual repository containing questionnaires, data collection tools, and coding frames based on standard statistical classifications covering a large number of countries.

Property:

Processing & analysis

Accessible at:

SurveyCodings

SSHOC Resources:

D3.4 Multilingual ontologies for Occupation, Industry, Regions and cities, Food items, and Religion, with use case

Automatic Verification Tool

The Automatic Verification Tool (AVT) enables the user to verify translations using Bilingual Word Embeddings and to report to the translators a set of translated questions to be re-checked.

The AVT imports the questions and make use of a trained Bilingual Word Embeddings model. It generates the 10 best foreign language translations of each English word.

Type:

Demonstrator

Property:

Processing & analysis

Accessible at:

github

Demo

SSHOC Resources:

MS18 Beta version of automatic verification software available for testing D4.11 Report on the experience with the automatic verification programme in SHARE wave 9 D8.1 Governance and Sustainability Roadmap

Audio Survey Experiment

The Audio Survey Experiment provides guidelines describing how to integrate the collection of digital language data into the traditional social sciences data collection process and provides audio transcript data which can be analysed with the help of natural language processing tools.

Type:

Dataset

Property:

Datasets