24 March 2020 - 10:00 to 12:00
Utrecht, The Netherlands


Organised by SSHOC, the workshop "Linking Social Survey and Linguistic Infrastructures through EOSC" will be held in Utrecht on the 24th of March 2020 (10am-12pm CEST, registration from 9.30am), in co-location with the 3rd SSHOC Consortium meeting.

Workshop Description and Objectives

Survey Infrastructures systematically interview tens of thousands of individuals across Europe each year. These people are randomly selected from the population and represent all walks of life. These interviews take around an hour, in which respondents provide the survey infrastructure with a wide range of data on themselves that is valuable to researchers and subsequently policy makers. Yet, a large proportion of the information conveyed in an interview is lost. Complex life histories or events are coded into structured taxonomies which are necessary for cutting edge sociological research but many aspects of an individuals’ responses are thrown away. The respondents tone, their clarity, their fluidity, the depth of the vocabulary can all be used to provide insights into various concepts of interest to social scientists such as cognitive function, socio-economic status or verbal reasoning skills. To make use of this lost data however, it is necessary to integrate the tools of linguistic infrastructures into the analytical pipeline of survey infrastructures. The cross-pollination and integrated usage of tools is precisely what the EOSC aims to do and the work in Task 4.4 of SSHOC therefore seeks to provide a proof of concept and framework for future research that explores this approach.

In this workshop there will be a series of presentations and collaborative discussions around the potential for integrating social survey and linguistic infrastructures.

  • Tom Emery will present work conducted by the GGP ( on capturing audio data through existing survey software in online interviews, and will provide initial evaluations of data quality.
  • Henk van den Heuvel from the Oral History team ( will then detail the tools used for analysis of Oral History data which could be adapted for analysis of survey interviews. In particular he will address the so called Transcription Chain, the basis of which is automatic speech-to-text conversion. The resulting text can, if needed after manual correction, be processed by NLP tools to obtain more insights into the linguistic structure, or to carry out a topic detection or text summarisation, to mention a few options.
  • This will then be followed by an interactive discussion between participants regarding the potential application of these tools and new avenues of scientific enquiry that could be integrated into the next phase of work in task 4.4.



The full agenda will be published soon



Utrecht University
Room: Sweelinckzaal 0.05
Drift 21, 3512 BR Utrecht
The Netherlands