Multilingual Search for Tools and Resources

Terminological resources are mostly described in English only, regardless of the domain. This issue constitutes an obstacle to the access and reuse of information: in this way, such resources are easily available to English native speakers but could prove less straightforward to find to researchers and users speaking different languages, meaning that content is not truly available to all in an equal manner. Multilingual metadata terms and vocabularies are thus meant to provide multilingual access to content across different languages and to improve discovery of resources and tools by non-native English speakers.

What is the Resource for?

The CLARIN Multilingual Metadata resource translates into different languages the set of 232 approved metadata from the CLARIN Concept Registry (CCR), which are originally available in English. Together with metadata, their definitions are translated as well.

CLARIN metadata aims at describing language resources. In particular, they are meant to map and harmonise metadata from various centres in the Virtual Language Observatory (VLO). Translating such metadata and their definitions allows to describe in many languages the different resources in the VLO, developed by different researchers from different countries, and to pursue the harmonisation intent with a multilingual approach. In this way, non-native English speakers will be able to describe their resources more easily, as they can adopt their own language. Similarly, SSH researchers will be able to find and access such resources in a simpler way. 

Resource Description

The CCR 232 approved metadata and their definition are translated into four languages, namely Dutch, French, Greek and Italian. Translations are obtained by exploiting state-of-the-art Machine Translation tools (Deep-L, Google Translate, LINDAT Translation, Reverso) and manually checked by domain experts that are native speakers of the language.

 

 

Benefits and Findings

The lack of multilingual terminological resources in different domains constitutes an obstacle to the access and reuse of information.

  • How can multilingual metadata concepts help?
    • Multilingual metadata concepts provide multilingual access to content across different languages and highly enhance discoverability of resources in the SSH by non-native speakers; SSH researchers can easily manage a wider range of SSH content.
  • Can Machine Translation tools offer an effective solution to metadata concepts translation?
    • Machine Translation tools perform well, although their results need to undergo validation.

 

More Information

Multilingual metadata are available both in tabular format (CSV) and as a SKOS resource from the ILC4CLARIN repository.

Frontini, Francesca; Gamba, Federica; Monachini, Monica;  Broeder, Daan, 2021, SSHOC Multilingual Metadata, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa, http://hdl.handle.net/20.500.11752/ILC-568.

They can also be explored online from the SSHOC Vocabularies Platform:
https://vocabs.sshopencloud.eu/vocabularies/sshocmm/

A complete description of the translation and validation of the metadata can be found in D3.9: https://www.sshopencloud.eu/d39-report-ontology-and-vocabulary-collection-and-publication

SSHOC Events: