A high-quality orthographic transcript is the basis for all types of analyses of spoken language data. However, transcribing speech is a time-consuming and tedious task. But automatic speech recognition as well as NLP and text annotation tools can make this task much quicker and save you a lot of time and frustration.
In this first of a series of SSHOC webinars, organised by the consortium partner CLARIN ERIC, we will discuss the theoretical basis and the technology available for transcribing spoken language. In particular, we will focus on the role of automatic speech recognition – what are the opportunities, what are the pitfalls and to where can it be applied successfully.
Introduction to working with interview data. Henk van den Heuvel - Head of the Humanities Lab at the Faculty of Arts, Radbound University Nijmegen - will briefly introduce the topic of using interviews as research instrument, and the cross-disciplinary nature of using interview data. He will also give some background on the Transcription Chain initiative which originated from oral history research but has a much larger potential.
Demonstration of automatic transcription of speech. Christoph Draxler - Researcher at the Institute of Phonetics and Speech Processing at Ludwig Maximilian University Munich - will demonstrate the web portal for the automatic transcription of speech. This portal currently supports three languages (English, German, Dutch), with Italian and Czech in the pipeline. The portal provides a user-friendly interface, so that researchers without a technical background may use state-of-the-art recognizers, optimized annotation editors and powerful segmentation services to result in high-quality time-aligned transcripts. These transcripts are the basis for the following in-depth scientific analysis, e. g. topic modeling, linguistic structures, named entity recognition.
Henk van den Heuvel
Christoph Draxler