Submission date: 
24 April 2020

This report constitutes SSHOC Deliverable 9.7 Design and Planning of Knowledge Graph in Electoral Studies. It describes the approach as well as the implementation plan for the SSHOC Pilot project in the field of electoral studies, which will constitute Deliverable 9.9.

After a brief introduction that sets out the purpose of this report, Section 2 elaborates the substantive domain to be covered by the Knowledge Graph (KG) and subsequently develops the concept of a KG and reasons for developing such infrastructural tools. The substantive domain is contained within the field of Electoral Studies, which was described in detail in Deliverable 9.6 Demarcation Report of Electoral Studies User Community. For reasons of practicality, this domain has been further specified in several steps. First it was specified as pertaining to (sub)field of citizen/voter behavior, and subsequently it was narrowed down further to the field of (studies of) electoral participation. These choices imply that the KG to be developed will not cover the entire field of electoral studies, but that it will be centrally located within the wider field and that its relevance for end-users is not restricted to those who are primarily interested in electoral participation.

The aspired functionality of the pilot-KG is discussed in Section 3. It defines as its intended audience of end-users in first instance scholars engaged in empirical research, but it expects in its later development stages to be also of relevance to journalists, think tanks, government agencies, corporations, political parties and politicians, and individual citizens. Functionalities to be sought relate to (a) focussed searching based on domain-specific criteria not available in other search tools; (b) teaching; and (c) research.

Section 3 also discusses the main datasets and kinds of publications to be covered by the pilot-KG. The overwhelming majority of these datasets are in the public domain, while a considerable part of relevant publications is not (at least not for the first years after publication). This provides challenges that will be addressed in the work program of Deliverable 9.9.

Section 4 discusses how the team developing the pilot-KG plan to involve the user community of electoral studies. Recruitment for such involvement is planned to be done via relevant scientific conferences (which are identified in Section 4) and the authorship of publications in relevant scientific journals; recruited volunteers will be mainly tasked with pre-structured coding and testing. In addition, smaller groups of community members will be personally invited based on their expertise and their willingness to commit somewhat more of their time to assist in the development of an ontology, coding schemes, and of the design of testing phases.

Section 5 discusses the development of an ontology, which is necessary in view of the de facto absence of ontologies, classification schemes or controlled dictionaries that would be able to classify substantive content within the field of electoral studies. The section discusses approaches for ontology development and how these will be applied in the context of the development of the intended pilot-KG. In this context, this section also discusses the governance and management of the KG and its ontology.

Section 6 presents technical specifications for processes such as data ingestion, data cleaning, data authoring, data linking, data enrichment, data provisioning and data analysis. It also discusses the technical environment of semantic middleware to be used (for which PoolParty was chosen).

Section 7 discusses testing and user-community involvement in that process.

Section 8 discusses post-delivery development issues, and Section 9 presents a planning in terms of tasks and timelines.