07 December 2021

How can researchers and data stewards be assisted to support FAIR and Open Research, through better provision of training materials about these topics? What challenges do research catalogue providers face when providing learning materials for research data management (RDM) training? 

Successful compliance with the FAIR principles is crucially dependent on the thorough curation of research data. The current situation in the scientific community shows that efficient RDM training and support is still a complex issue. Therefore, many European funded projects have developed catalogues to aggregate and disseminate learning resources for data stewards, researchers and trainers. In line with the aim of the European Open Science Cloud (EOSC) which creates the conditions for sharing and optimizing research data and services, the purpose of this Open Science Fair workshop was to raise awareness of existing platforms and exchange ideas to optimize these catalogues. This followed from two previous workshops on the theme of harmonization of such catalogues, organised by FAIRsFAIR in collaboration with the EOSC-5-TF Skills and Training.


Workshop Overview


The Open Science Fair offered a great opportunity to reflect upon current trends and tools for RDM training. On 21 September, around 50 participants gathered online to join the workshop offered by members of the Open Science Training Coordinators Community of Practice, an interdisciplinary network of 95 trainers and training organizers from more than 20 countries. Established in 2018, the informal community aims to exchange best practices and broaden its knowledge on training activities of various pan-European, EOSC-related initiatives. 

Iryna Kuchma (EIFL) introduced the session on The RDM Training & Support Catalogue Landscape. Providing easy-to-use recommendations for RDM training is essential. But there are, of course, different needs and practices regarding data stewardship training. This is why the workshop started with the presentation of two best practice examples which help with standardization of training materials. Elizabeth Newbold (STFC) is a member of the Research Data Alliance Interest Group on Education and Training on handling of research data and introduced the audience into a minimal metadata set intended to enable discovery of learning resources across catalogues. The core set that the Interest Group is proposing for adoption by learning resource creators and service providers contains 14 different elements: 



Afterwards, Allyson Lister (FAIRsharing/Oxford University) and Laura Molloy (CODATA) presented Terms4FAIRskills. This formalised terminology is “describes the competencies, skills and knowledge associated with making and keeping data FAIR”, as stated on the website. It contains around 500 terms related to the creation and assessment of data stewardship curricula that facilitate the definition of skills for use in job descriptions and controlled vocabularies, encouraging consistent, recognised, structured description of FAIR data competencies.


The terms4fairskills model


Getting Familiar with RDM Catalogues 


The third part of the workshop was focused on the presentation of different training catalogues. Five experts from the field of open science shared information on main characteristics of the catalogues. They introduced their tools with regard to general considerations concerning training practices and touched upon the challenges linked to providing training resources on FAIR data, open science and domain-specific training in the catalogues.


Paula Oset Garcia from Ghent University introduced the Training and Support Catalogue developed by EOSC-Pillar. It is designed as a collection of training and support materials for data stewardship and research data management support. EOSC-Pillar contains not only training materials but also other-related materials that can be used to support researchers with implementing RDM best practices.

SSH (Social Sciences and Humanities) Training Discovery Toolkit 

The SSH Training Discovery Toolkit presented by Ellen Leenarts (DANS) is an inventory of various learning resources on RDM, open science and domain-specific training materials that trainers of different disciplines in the SSH can use to find materials for re-use in their own training activities. The toolkit is being updated continuously in close cooperation with the SSH training community.


DARIAH Campus is a discovery platform that hosts training resources from the Digital Research Infrastructure for the Arts and Humanities (DARIAH) Community and DARIAH-affiliated projects. Vicky Garnett from DARIAH-EU explained that resources on DARIAH Campus relate directly to issues around open science training and data management, most of which are already formatted for use among learners.

ELIXIR Training Portal TeSS

The ELIXIR Training portal TeSS is a registry that allows for browsing, discovering and organising life sciences training resources (events and materials). It is run by ELIXIR (Europe's distributed infrastructure for life-science data). Celia van Gelder (DTL/ELIXIR) highlighted that over 70 training content providers have registered their resources in TeSS.

EOSC Future Training Catalogue

Lucia Vaira from LifeWatch ERIC explained the EOSC Future Training Catalogue. As part of the follow-up EOSC project EOSC Future, it will be a hub of training resources that is first, supposed to bring training catalogues together and second, align the diverse EOSC community in terms of quality and interoperability. As a special feature, it will support metadata harvesting of external catalogues and portals. 


First Catalogue Challenge: Controlled Vocabularies


As we could see above, each catalogue has its unique focus and features, yet they encounter similar challenges. Therefore, the last part of the workshop consisted of a lively discussion about three challenges that catalogue providers face. Using a poll for each challenge followed by discussing the answers from the audience gathered important input for further discussion. 

Controlled vocabularies enable effective information retrieval and resource discovery by adhering to a consistent description of resources. The most prevalent challenges in the realm of controlled vocabularies include: 

  1. Selecting suitable controlled vocabularies and establishing data curation frameworks,
  2. Creating community agreement and establish structured decision-making about new controlled vocabularies,
  3. Ensuring interoperability within the catalogue ecosystem that is developing constantly. 

The main takeaway is that cooperation and coordination between catalogue providers to create more unified and therefore more successful controlled vocabularies is key. Inter-organizational exchange is crucial for detecting best practices, e.g. with regard to questions such as: Who gets to decide on specific controlled vocabularies for a catalogue? How can we ensure consensus upon vocabularies? In addition the challenge of adding or changing a specific controlled vocabulary for a specific catalogue was discussed. This is not a trivial task, and there is a need for mapping of different controlled vocabularies.

Participants also highlighted the positive impact of a user-centered approach. Ultimately, it matters that the users know how to use the catalogues to find the information they desire. Therefore, the catalogue providers firstly have to get to know their key user groups and find out what they need. User-friendliness will create a more frequent use and thus higher visibility of the catalogue.

The last point from the group exchange was the importance of finding domain-agnostic high-level terminology that is based on components of a Web Ontology Language or follows the Simple Knowledge Organization System. Vocabulary that is used by multiple platforms aids interoperability and data sharing. 


Second Catalogue Challenge: The Curation Process


How do we improve catalogue curation processes? This is the second challenge that the group dedicated its attention to, and in particular focused on: 

  1. Finding a systematic curation approach and quality evaluation criteria,
  2. Issues with funding for curation and updates after the project ends,
  3. Measures to judge the quality of discipline-specific content.

A first group of answers was centered around building incentives to make proper curation more attractive. That includes making curation/editorial teams' efforts visible, e.g. by integrating their curation activities to their ORCID profiles. User engagement helps increase the size of the curation team. That includes capturing users’ feedback by establishing open peer-reviews and feedback sprints. 

Participants stated that agreeing on curation guidelines for training resources would help with quality management. Since the number of published training materials is usually huge, catalogues would benefit from overarching quality management processes. Establishing certain criteria that can be used to evaluate the catalogue entries supports standard setting and FAIRness. But, “how responsible are we for the quality of the resources we provide?” — catalogues host functions as service providers and not necessarily as data producers, and so this is a significant question to be addressed since the main responsibility for the quality lies with the authors of the training resources. 


Third Catalogue Challenge: Sustainability


Ensuring the longevity of catalogues after the end of the project is a fundamental issue that the majority of catalogue providers face, as the feedback indicated. Especially:

  1. Data curation and maintenance,
  2. Lack of provision for integrating catalogues across similar initiatives.

For example, data curation after the EOSC-Pillar project ends. For the SSHOC Training Discovery Toolkit it remains unclear where the data will be stored. This catalogue challenge is the one that got the least Mentimeter answers indicating the difficulties of catalogue sustainability. 

The participants recommended planning ahead - for example, getting informed about future funding opportunities, follow-up projects or similar initiatives before a project ends. Adhering to standards so that the data is easier to transfer to other platforms is also highly beneficial. The DARIAH-Campus Reuse Charter is a best practice example for sustainable data management. Such community-validated quality criteria for training resources facilitate the transfer to other catalogue providers.


Final Thoughts


Bringing together a group of engaged people from the field of open science created an enriching atmosphere for mutual learning about catalogues. Of course, the three challenges could not be solved within two hours, but this kind of dialogue is what matters and it fosters new learning impulses and forms of cooperation. The participants addressed issues that are crucial for the future of catalogues: standard setting, cooperation between stakeholders, data curation, and ultimately sustainability: only if catalogue providers manage to create long-term visions for their data repositories, they will have a lasting impact on FAIR and open science practices in the EU and beyond.

The Workshop recording is available here and slides here.


written by Judith Wehmeyer, research associate at GESIS – Leibniz Institute for the Social Sciences and member of the SSHOC project