20 January 2022

In Social Sciences and Humanities (SSH), citation of data is an indispensable part of the research process because it allows researchers to ensure reproducibility and transparency of the research process, as well as give credit to the creators and funders of the data. It also provides visibility. However, experience shows that practices of data citation are very diverse and making it fully machine-actionable across scientific fields is in fact very complicated. The main culprits are the incompleteness of existing citation methods and the complexity of the necessary technical landscape which make the challenge even greater.

The SSHOC project members, mainly WP3/T3.4, first addressed this issue by creating an inventory of citation practices, and based on what was discovered moved on to developing  specific recommendations for SSH. A software prototype was also created in order to help the community make SSH datasets citable, to visualise and exploit citations and to provide facilities for curation and semantic annotation of these resources. In December, SSHOC organised a webinar FAIR SSH Data citation: practical guide with the aim to present those recommendations as well as talk more thoroughly about the value and necessity of data citation. Some tools that can be used, for example the FAIR SSH Citation prototype, and ideas for data-based scholarship were also presented.


Event page


Data citation in SSH

Given the many different disciplines within SSH, the current data citation practices are very diverse. As Nicolas Larrousse (Huma-Num/CNRS) pointed out, SSH lack a common approach to data citation. Nevertheless, there are a number of recommendations already available (DASISH Project, ICPSR, CESSDA, SHARE, W3C's Web Annotation Data Model, RDA Data Citation of Evolving Data, etc.) that allow researchers to cite the data in such a way that it can be shared and reused across different hardware and software platforms.


Data Citation Recommendations

Edward Gray (Huma-Num/CNRS) noted that a few years ago it might have been enough to cite one source as a simple string of bibliographic information that include author and title in Chicago style, APA or MLA, but today this is no longer sufficient. In order to make better data citation, a set of recommendations based on FORCE11 has been developed in the framework of SSHOC. The recommendations have been further developed and adapted to the needs specific to SSH, mainly based on peer review and the feedback from the Round Table of Experts.

Recommendations for FAIR Data Citation in the Social Sciences and Humanities are published in a document that describes each recommendation separately. More specifically, it lists eight recommendations – Importance, Credit and Attribution, Evidence, Unique Identification, Access, Persistence, Specificity and Verifiability, and Interoperability and Flexibility. The screenshot below shows how each of the recommendations is tackled from three different perspectives: in the first column, the general, societal and technical challenge with data citation is explained, the second column brings practical recommendations to address this problem, and the last column specifies the expected outcome.

Source: Recommendations for FAIR Data Citation in the Social Sciences and Humanities, p. 6

These recommendations are aimed at all stakeholders in SSH, from researchers and engineers to funders and research infrastructures. However, the Recommendations for FAIR Data Citation in the Social Sciences and Humanities also list a number of use cases that can help any reader identify whether the recommendations might be beneficial  in their specific case. The use cases that are further elaborated in the document, include:

  • I am a Researcher and/or an Engineer working for a project
  • I am a Research software engineer working for a project and/or research infrastructure.
  • I am a Manager of a data repository.
  • I am a Data Librarian, a Data Steward or an Open Science Officer.
  • I am a Research Funder.
  • I am a Researcher who is conceiving a research project.
  • I am a member of the public who wishes to re-use data.

Data Citation Prototype

The FAIR SSH Citation prototype presented by Cesare Concordia (ISTI-CNR) is a software tool designed for harvesting metadata.

The main functionalities of the prototype are:

  • exploring metadata on datasets,
    • retrieving metadata from landing pages via PIDs (DOIs, handles and others) or URLs,
    • retrieving information from other sources like APIs knowledge graphs etc.
  • providing facilities for curation and semantic annotation of citations,
  • visualising and exploiting citation metadata, and
  • disseminating metadata.

The prototype can be tested by anyone interested in the tool through the following services:

Alternatively, citations from the abstracts of all (ADHO) DH conferences 2015–2020 and from DHQ journal articles can be checked on this link.


Citation alone are useless

To ensure good data citation you need a complete ecosystem. First, there needs to be good documentation, such as the Recommendations for FAIR Data Citation in the Social Sciences and Humanities, where you can find general and specific information on data citation in SSH. The community also needs norms and standards, such as specific vocabularies that are machine-actionable (e.g. ORCID ID), and the storage of data in trusted repositories that provide PIDs. Finally, to ensure good citation practices, one needs high quality dissemination tools that not only automatically disseminate but also ensure visibility of the data.


Want to know more?

Many other useful insights were shared during the webinar, so we invite you to watch the webinar recording and look through presentation slides. If you want to dig further, you can have a look at the blogpost and recording of our previous workshop on data citation in practice.

If you are interested for more, do not miss next SSHOC events: