From the attic to the cloud: mobilization of endangered language resources with linked data

Monday, 15 June, 2020

ACL Anthology

This paper describes a collection of 20k ELAN annotation files harvested from five different endangered language archives. The ELAN files form a very heterogeneous set, but the hierarchical configuration of their tiers allow, in conjunction with the tier content, to identify transcriptions, translations, and glosses. These transcriptions, translations, and glosses are queryable across archives. Small analyses of graphemes (transcription tier), grammatical and lexical glosses (gloss tier), and semantic concepts (translation tier) show the viability of the approach. The use of identifiers from OLAC, Wikidata and Glottolog allows for a better integration of the data from these archives into the Linguistic Linked Open Data Cloud.

Author: Nordhoff, Sebastian (Leibniz-Zentrum Allgemeine Sprachwissenschaft-ZAS Berlin)

Conference: LREC 2020 - Proceedings of the Workshop about Language Resources for the SSH Cloud (LR4SSHOC)

Date: May 2020

Publisher: European Language Resources Association

From the attic to the cloud: mobilization of endangered language resources with linked data

News

SSHOC 2025 Updates

Science Clusters Position statement on operational commitment to EOSC and Open Research

SSHOC, the SSH Open Science Cluster has a New Chair and Vice-Chair in 2024

OSCARS project funded to foster the uptake of Open Science in Europe

Strengthening Cross-Cluster Collaboration: Highlights from the 2nd SSH Open Cluster Assembly