Online seminar on PIDs for research data

Thursday, 15 February 2024, from 13:00 to 16:30

In addition to keynote speeches on best practices and concrete application examples in the persistent addressing of research data, there will be an interactive part on needs and challenges.

Research with complex scientific and technical infrastructures, such as particle accelerators, satellites or research vessels, generates a high volume of digital research data. This data is of central importance for the reproducibility and traceability of scientific results. Persistent identifiers (PID) assign these digital resources a unique and permanent reference. This means that the data can be identified, found and cited in the long term, regardless of changes in storage locations or other metadata.

The online seminar "PIDs for research data", organised by the PID Network Germany project, was dedicated to the importance of PIDs for research data. On the occasion of the "Love Data Week 2024" campaign week, the seminar offered an insight into areas of application and best practices. More than 260 people registered for the event. The handling of research data at the Ludwig-Maximilians-Universität München and in the Gemeinsame Normdatei (GND) was outlined as an example. In addition, the importance of PID in the National Research Data Infrastructure (NFDI) was presented as part of the PID4NFDI project. The Helmholtz Metadata Collaboration (HMC) also presented the use of PID in a knowledge graph as a concrete application example. The speakers' slides are linked in the table below and can also be viewed on Zenodo.

Following the presentations, the needs in relation to persistent and unique identification of research data were identified together with the participants. Solutions were discussed to overcome obstacles in the application, implementation and dissemination of identifiers in scientific and cultural institutions in Germany. Around 30 participants exchanged ideas in small groups. The results were recorded on a miro board and are summarised below.

Summary of the interactive part

Two main topics were divided into the group exchange. Firstly, the participants discussed the question of how they use PID in the field of research data and why. They then discussed what aspects they feel are missing in the current use of PGD for research data. The discussion was based on the diverse experiences and fields of action within science in order to shed light on the use of PID for research data.

The benefits of using and implementing PID for research data were emphasised. In particular, increased visibility in connection with the desire to have research achievements recognised plays a major role. The advantages of findability, citation and clear identification were also emphasised. The aspect of interoperability and networking was also discussed in relation to the FAIR criteria. The linking of research data with publications and also the clear linking to information on research instruments and samples was discussed. The exchange on the use of PID for research data showed that there is a broad spectrum: from non-existent fields of application to pure awareness work and passive use (retrieval/reuse of metadata) to active allocation for research datasets in own repositories.

In the second part of the exchange, the participants talked about obstacles and possible solutions for implementing PID for research data. The participants identified the following aspects: a lack of awareness among researchers, a lack of (central) advice centres or educational offers, along with the desire for clear regulations and guidelines. The desire for coordination and agreement within the scientific communities and institutions was expressed, particularly with regard to the use of metadata schemas. In addition, the benefits of established workflows, the integration of infrastructures into local workflows and possibilities for automated linking were discussed as future tasks. The multiple allocation of PID and a lack of institutional resources were cited as additional obstacles. The handling of sensitive data, ethical framework conditions and consideration of the CARE principles were also considered insufficient. The online seminar was conducted using Zoom and the event language was German.

Programme:

	Agenda	Speaker	Documentation
13:00-13:05	Welcome, programme and methods	Antonia Schrader (Helmholtz-Gemeinschaft)	Slides
13:05-13:20	Overview lecture: What do PIDs do for the research data ecosystem?	Paul Vierkant (DataCite)	https://doi.org/10.5281/zenodo.10665361
13:20-13:35	Dealing with research data PIDs at the Ludwig-Maximilians-Universität München	Vanessa Gabriel (LMU)	https://zenodo.org/records/10671033
13:35-13:50	Appealing authority: GND for researchers	Barbara Fischer (DNB)	https://zenodo.org/records/10631161
13:50-14:00	Pause
14:00-14:15	PID4NFDI	Tibor Kalman (GWDG)
14:15-14:30	The use of PIDs for entity resolution within the Helmholtz Knowledge Graph	Volker Hofmann (Forschungszentrum Jülich / Helmholtz Metadata Collaboration (HMC))	https://zenodo.org/records/10723293
14:30-14:40	Introduction to the interactive part of the seminar	Antonia Schrader (Helmholtz-Gemeinschaft)
14:40-15:25	1st part group work	All participants
15:25-15:30	Pause
15:30-16:15	2nd part Group work	All participants	Miro Board export (subsequently organised thematically by the project)
16:15-16:30	Wrap-up & good bye	Antonia Schrader (Helmholtz-Gemeinschaft)	s.o.

All interested parties were invited to take part in this open exchange, regardless of whether they already have experience with PIDs for research data.

If you have any questions or suggestions, you can contact us at any time at info.pidnetwork@listserv.dfn.de.

Thank you for your participation!

The project partners of PID Network Germany are DataCite, the German National Library, the Helmholtz Open Science Office, the German National Library of Science and Technology (TIB) and Bielefeld University Library. The project is funded by the German Research Foundation.