Terminological resources are mostly described in English only, regardless of the domain. This issue constitutes an obstacle to the access and reuse of information: in this way, such resources are easily available to English native speakers but could prove less straightforward to find to researchers and users speaking different languages, meaning that content is not truly available to all in an equal manner. Multilingual metadata terms and vocabularies are thus meant to provide multilingual access to content across different languages and to improve discovery of resources and tools by non-native English speakers.
The CLARIN Multilingual Metadata resource translates into different languages the set of 232 approved metadata from the CLARIN Concept Registry (CCR), which are originally available in English. Together with metadata, their definitions are translated as well.
CLARIN metadata aims at describing language resources. In particular, they are meant to map and harmonise metadata from various centres in the Virtual Language Observatory (VLO). Translating such metadata and their definitions allows to describe in many languages the different resources in the VLO, developed by different researchers from different countries, and to pursue the harmonisation intent with a multilingual approach. In this way, non-native English speakers will be able to describe their resources more easily, as they can adopt their own language. Similarly, SSH researchers will be able to find and access such resources in a simpler way.
The CCR 232 approved metadata and their definition are translated into four languages, namely Dutch, French, Greek and Italian. Translations are obtained by exploiting state-of-the-art Machine Translation tools (Deep-L, Google Translate, LINDAT Translation, Reverso) and manually checked by domain experts that are native speakers of the language.
The lack of multilingual terminological resources in different domains constitutes an obstacle to the access and reuse of information.
Multilingual metadata are available both in tabular format (CSV) and as a SKOS resource from the ILC4CLARIN repository.
Frontini, Francesca; Gamba, Federica; Monachini, Monica; Broeder, Daan, 2021, SSHOC Multilingual Metadata, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa, http://hdl.handle.net/20.500.11752/ILC-568.
They can also be explored online from the SSHOC Vocabularies Platform:
https://vocabs.sshopencloud.eu/vocabularies/sshocmm/
A complete description of the translation and validation of the metadata can be found in D3.9: https://www.sshopencloud.eu/d39-report-ontology-and-vocabulary-collection-and-publication