The European Question Bank (EQB) aims to provide a central search facility across all the Consortium of European Social Science Data Archives’ (CESSDA) survey holdings. It uses a question-level metadata schema based on the DDI-Lifecycle standard (DDI Alliance 2020). The ambition is for users to be able to find survey questions, their translations to applicable languages, answer categories, pre- and postquestion texts, and the study title. The population of the EQB with the multilingual online continuous WageIndicator Salary Survey (Tijdens & Osse) provides an important use case to test the feasibility of harvesting social science questionnaires for the EQB. The survey, started in 2000 in the Netherlands, is currently collecting data in 160 countries around the world and can be completed in 58 languages. For each country, the questions have been translated.
This report describes how all required question-level metadata, including survey questions, their translations to applicable languages, answer categories, pre- and post-question texts, and the study title, was imported into the EQB. Due to the existence of structured metadata, the extraction of all relevant questionnaire texts did not pose any challenges. Conversion of the structured metadata to DDI lifecycle standards required by the EQB was performed using conversion scripts that can be re-used in the future. The metadata published in the beta version of the online EQB effectively gives users access to survey questions, their translations to applicable languages, answer categories, pre- and post-question texts.
Three main challenges were encountered in the process of populating the EQB with the multilingual WageIndicator Salary Survey. Firstly, the introduction of question grids into the EQB posed challenges to the search function since the search terms most likely to be used by end users were not found in the question texts but answer categories. This was solved by expanding the search function and by breaking down the relevant question grids into separate questions. Secondly, the inclusion of answer categories including extremely long lists of potential answers, which are commonly stored outside the survey XML sheet and called on using APIs, posed challenges to the inclusion of all answer categories in the EQB as well as to the display of those answer options to end users. In the EQB, access to answer categories stored in APIs and their translations is now provided by the inclusion of links to the source API. This solution ensures users have access to the most up-to-date versions of the ontologies and vocabularies, as well as to all translations. Thirdly, the large number of language versions of the WageIndicator version made high demands on the IT system, which has to be able to correctly process numerous writing systems, as well as to the user-friendliness of the web interface. The beta version of the EQB is capable of processing all included language systems. The translations for the 217 language versionsare collapsed by default in the EQB to make the site more user friendly; users can click to expand and access all translations.