Submission date: 
29 November 2021

This report recapitulates the work done in SSHOC Task T9.3, the Data Community Project for Electoral Studies. Its purpose was to develop a pilot of a Knowledge Graph (KG) in the field of Electoral Studies. The first sections of the report recapitulate the background of and motivation for this purpose, and the manner in which the pilot-KG was developed. Section 4 discusses a number of overarching insights that derive from the work done and the experiences that were gained in the process. These insights are expected to be also relevant for other intended efforts to develop KGs in sub-disciplinary domains of the social sciences and humanities. These insights relate particularly to issues of feasibility and quality of metadata. The most important ones are:

  • In spite of the wide variety of invaluable sources of metadata describing scholarly publications, datasets, and other relevant entities, available metadata are not without their problems. The most important of these include:
    • Granularity of many kinds of metadata –particularly those relating to the substantive character of original resources– is often too coarse to be useful when focusing on subdomains of disciplines. This complicates the descriptive characterisation of the content of scholarly publications and datasets, particularly when no widely accepted sub- diciplinary controlled vacabularies exist –as is the case in the field of Electoral Studies.
    • Sources of metadata vary in terms of coverage, richness, timeliness and quality, which leads to the need for KG developers to invest considerable effort in comparison of the strengths and weaknesses of available sources of metadata, taking into account these different aspects.
    • Quality problems are endemic for some types of metadata, most notably in terms of non- completeness, which sometimes reaches for centrally important types of metadata from highly reputed sources levels of 80%.
    • Quality of metadata about data used is generally poor as a result of poor data citation practice in the social sciences and humanities. Such citations lag far behind quality of citations to scholarly and other literature. Often, data citations –if available at all– consist of a free-form text string that is not directly machine actionable. Improvements require sustained effort from data creators and disseminators, but also from publishers, journal editors, and authors of scholarly literature.
  • End-users of KGs must be aware that KGs are not and cannot be the definitive answer to all their information needs. Particularly in view of high rates of missingness for some types of metadata, KGs cannot be expected to exhaustively identify instances of the desired kinds of information.
Publication type: