Date: 
03 June 2020
 

The heritage and archaeology sectors are producing increasing volumes of very diverse data through extensive use of digital technologies and data analysis. Data in this area is often derived from non-repeatable interventions, given the processes and the uniqueness of the objects and sites studied. It is also highly multidisciplinary and combines a wide range of methodologies and forms of data, often adapting novel technology, which can lead to the data being particularly fragile and subject to obsolescence. Under these conditions, management of data is complex. The purpose of the webinar was to present the latest guidance and best practices through the lens of data use and reuse, offering principles and learnings that are not only relevant to the sector, but can be adapted to other science and humanities contexts. 

The Social Sciences and Humanities Open Cloud (SSHOC), European Research Infrastructure for Heritage Science (E-RIHS) and the Saving European Archaeology from the Digital Dark Age COST Action (SEADDA) are three leading actors in developing and promoting good-practice for the full data cycle for archaeological and heritage science data. With the occasion of the SEADDA Exploratory Workshop on Use and Reuse of Archaeological Data at the end of March 2020, the three initiatives got together to bring a day of knowledge share and discussion aimed at heritage science and archaeological data. The event was chaired by Julian Richards (University of York) and brought together a pull of lecturers from diverse background and expertise: Holly Wright (University of York), Jessica Hendy (University of York) and Scott Orr (UCL) and Alejandra Albuerne (UCL).

 

FAIR heritage science and archaeological data

FAIR principles (Findable, Accessible, Interoperable and Reusable) are broadly accepted for research data, but their implementation can be tricky. Holly Wright presented the data curation policy framework recently developed by E-RIHS for heritage science data, which provides guidelines[1] for meeting the FAIR principles. Making data open is not enough to ensure its use and re-use: it is also necessary to understand how data is being reused from both, a qualitative and quantitative perspective. This has been a key objective for E-RIHS.

The guidelines are organised around the four components of FAIR:

  • Findability can be achieved through the use of persistent identifiers for datasets and researchers, as well as the development of relevant metadata schemas and standards for heritage science by the relevant communities.
  • Accessibility can be enhanced by making datasets open wherever possible. It was recognised during the session that there are numerous instances in the sector where data privacy prevails, in which cases it is recommended that metadata is made open to make the data discoverable. Repositories with well-defined access conditions are also a must.
  • Interoperability requires standards that are both human- and machine-readable and the use of non-proprietary file formats. E-RIHS intends to offer metadata models as part of the resources in their DIGILAB.
  • Re-usability relies on producing data that is ready for future research processing. This requires systematic documentation, version control and even file naming, as well as sufficiently rich and consistent metadata that informs about provenance, methodology and equipment, among other things. Licensing is greatly important and needs to be clear. CC-O licensing is recommended for metadata and CC-BY for datasets wherever possible.

 

Data archiving for re-use in heritage and archaeological science

One of the key strategies for enabling the re-use of data is effective data archiving. Jessica Hendy reflected on her work on molecular biology in archaeological science to offer her perspective on the practice of data archiving, discussing the possibilities and challenges it brings.

The benefits of archiving data are many. It allows for a thorough analysis of the interpretation and quality assurance. It enables the replication of data analysis strategies in peer review and after. It also provides a means of long-term data storage that is non institute-specific. Data archiving is key for allowing future research to reanalyse existing data when new strategies are developed. There is also an increasing demand for transparency on how data was generated, both in the lab and computationally, which is promoting the recording of laboratory protocols using protocols.io and computational processing using GitHub or Bitbucket.

Nonetheless, data archiving also brings certain challenges. For example, datasets can be massive, so only institutions with sufficient computational support can analyse available data. This can lead to their domination in the field. In addition, data in heritage science and archaeology can be highly specialised, requiring specialised knowledge to critically interpret it. In addressing these challenges, Jessica Hendy suggested that data exploration should be made more easily accessible and not exclusively of interest to just a few research groups. This can be done, for example, through online processing capacity. In addition, since there remains a lack of awareness and communication of what data are produced and how it is stored, it is necessary to raise awareness among research partners and collaborators about the importance of sharing data and respecting community standards.

Exploring best practice and tools for use and re-use

The last part of the webinar, delivered by Scott Orr, had the purpose of offering hands-on advice in the form of best practices and tools to plan for the re-use of scientific data in heritage and archaeological contexts. He suggested the following question that should guide the researcher in planning data management: if you should come back to the data in a few years’ time, what will you want to know about it in order to interrogate it again?

Many points were discussed during this part with the following highlights:

  • Metadata and paradata are integral for the re-use of dataset, offering comprehensive information about the data and how it was obtained.
  • File formats need to be considered from the start. The researcher should consider the use of lossless files vs lossy files (e. g., RAW, TIFF or PNG instead of JPG or GIF).
  • When proprietary data is obtained, it should be considered whether the data can be exported to an open format, and what is the extent of the potential loss of information in the process. If any information is going to be lost in going from proprietary to open format, why not save the data in both formats?
  • 5-star open data is a deployment scheme for open data that goes from the most basic form of sharing data to the most comprehensive and linked way of making data available for re-use on the web. The main skills jump is in going from 3- to 4-star: from making your data available in non-proprietary open format to doing so using Uniform Resource Indicators that identify common elements that can be searched across several databases.
  • It is important to think about re-use when it comes to analysis, in particular code and algorithms, which are part of the procedures used for processing and interpreting the data. Documentation and annotations, a clear format and version control are invaluable for re-use.

 

Learning from each other

All presentations were followed by lively discussions where participants shared their knowledge and experience. There was ample discussion about the CARE principles for Indigenous Data Governance and how they can be generalised for other context where privacy needs consideration. Other topics of discussion were the role of DOIs and URIs in linking data to sites and objects, and the challenge with speed of data publication and data embargoes in different fields of heritage and archaeological science. Finally, Ron Dekker, SSHOC coordinator and member of EOSC Executive Board, shared information on the active role of SSHOC in supporting the SSH community, and underlined the specific need for SSH vocabularies.

Participate in knowledge sharing and learn more

If you want to know more, you can have a look at the webinar recording and the presentation slides.

E-RHIS wants to find out more about the different methodologies and workflows for data in heritage science and calls for new case-studies that will help them better understand how data is being used in the sector. If you have any suggestions, you can get in touch with Holly Wright.

In the coming months SSHOC will again join forces with key actors in heritage science data to prepare a workshop where the leading experts from the field will be discussing more aspects of the management of heritage data. Follow SSHOC events announcements on SSHOC website or SSHOC Twitter account in order not to miss it, or simply sign up for the SSHOC newsletter.

 

[1] The guidelines will be published this summer on E-RIHS website, so keep an eye on E-RIHS updates.