19 July 2021

The compilation of the [MCSQ]: Multilingual Corpus of Survey Questionnaires

This report describes the design and implementation of the [MCSQ]: Multilingual Corpus of Survey Questionnaires (MCSQ), a database of survey questionnaires’ texts. It documents the research output of Task 4.2: Preparing tools for the use of Computer Assisted Translation, of the Social Science and Humanities Open Cloud (SSHOC) project. By using this database as an example, the deliverable aims at providing guidelines on the creation of corpora in survey research. This document is closely related to Deliverable 4.3: Survey specific parallel corpora: the [MCSQ]: Multilingual Corpus of Survey Questionnaires, which corresponds to the database itself and its source code. The report is based on the compilation of version 1.0 (Ada Lovelace) dated in June 2020.

The corpus is compiled from European Social Survey (ESS) and the European Values Study (EVS) questionnaires in the English source language and their translations into Catalan, Czech, French (produced for France, Switzerland, Belgium and Luxembourg), German (produced for Austria, Germany, Switzerland and Luxembourg), Norwegian, Portuguese, Spanish and Russian (produced for Israel, Latvia, Lithuania, Russian Confederation, Ukraine, Estonia).

To prepare the social sciences for the greater adoption of gold-standards in translation procedures, such as Computer-Assisted Tools or translation memories, domain-specific corpora of survey questionnaires is needed. In line with the focus on open-source, open-access principles of the SSHOC project, this corpus is openly accessible (in a format which is compatible with CAT tools) and will represent an important resource for both corpus linguists, computational linguists, statisticians, typologists, social scientists, as well as translation scholars and localizers. The planned version 2.0 (Mileva Marić-Einstein) will expand to include the Survey of Health, Ageing and Retirement in Europe (SHARE) questionnaires. In the SSHOC project, part of the data in the MCSQ will be used in Task 4.3: Applying Computer Assisted Translation tools in Social Surveys to conduct translation research.

