How can we ensure that the links between publications and data are accessible and usable? [Let’s Talk Open Science #13]

Written by Léo Raimbault

In the final webinar of the Let’s Talk Open Science series for 2025, a cross-disciplinary perspective was offered on a key issue: the link between scientific publications and research data.

Why is this link so important for open science? How are national infrastructures organised to ensure effectiveness, interoperability and sustainability? And, in practical terms, how does a research project make use of these mechanisms?

To answer these questions, three complementary perspectives have been brought together:

  • Yannick Barborini (CCSD), Head of Development at HAL and Technical Director of the HALiance Project.
  • Nicolas Larrousse (CNRS Huma-Num), Head of the National and International Community Coordination Division for the Nakala Repository.
  • Amandine Wattelier-Bricout (CNRS, CESAH), PhD in Indian Studies and HAL Ambassador.

Publications without associated data: still too common a scenario

Who hasn’t encountered this situation before? You find a fascinating article with promising results, but then you discover that the link to the data is broken, the server is closed, or the files cannot be found.

This is not an isolated case. According to the French Open Science Barometer, while over 70% of French publications in 2023 mention the use of data, only a minority indicate that it is actually shared. In the social sciences and humanities, this figure falls to just 4–5%.

However, the benefits of sharing are well documented: articles associated with datasets deposited in a repository enjoy a citation advantage of up to 25%. Beyond bibliometric impact, the issue is primarily scientific, concerning transparency, reproducibility, the reuse of research products, and interdisciplinarity.

Therefore, linking publications, data and software is not merely an administrative formality, but a structural lever for making research more robust and visible.

A national framework: towards an interconnected ecosystem

The Second National Plan for Open Science (2021) recognises the importance of establishing an ecosystem that connects the various outputs of research, such as publications, data and software.

This structure is based on several essential building blocks:

  • permanent identifiers (e.g. DOI, ARK, SWHID);
  • structured metadata;
  • interoperability protocols.

Against this backdrop, a collaborative initiative emerged between HAL, Recherche Data Gouv and Nakala, initially launched as part of the European EOSC-Pillar project and subsequently consolidated within the HALiance structuring project.

HAL-Nakala: from proof of concept to production
Normalise relations

For a long time, there were links between publications and data, but they were heterogeneous: entered in a free field, slipped into a description or integrated into the title. Due to a lack of standardisation, these references were difficult to harvest and interpret, and as a result they were largely under-exploited.

The work carried out therefore consisted of:

  • overhauling the system for creating relationships in HAL;
  • using the standardised DataCite vocabulary to categorise relationships;
  • implementing aids and validation mechanisms (e.g. DOI control and identifier verification).
Exchange information between repositories

However, the decisive step lies in automating the exchange of this information between repositories. To achieve this, the teams relied on the international COAR Notify protocol, which enables one platform to notify another when a relationship has been established. The reciprocal relationship can then be generated automatically. This system has been operational between the two platforms since June 2025.

The links created in this way are not confined to local interfaces. They are exposed in exports and APIs, published via DataCite, and expressed in the Scholix format — the international standard for relationships between data and publications. This makes them visible to aggregators such as OpenAIRE, enhancing their visibility on an international scale.

Towards enriched publications

This infrastructure goes beyond simply displaying a link and paves the way for more ambitious developments. Discussions with the OpenEdition platforms are aimed at enabling the more sophisticated integration of data into editorial environments. Collaboration with Software Heritage for software operates under the same principle: to establish permanent links between all research outputs.

The technical network set up provides a solid foundation for developing enriched or complex publications where articles, data, and codes are integrated.

The Dharma Project: Connecting the Bricks in Practice

Amandine Wattelier-Bricout’s presentation illustrated these issues in practice through the ERC Synergy Dharma project, which focuses on inscriptions from South and Southeast Asia. The project produces photographs of artefacts, detailed metadata, critical editions of texts, catalogues, and scientific articles. Every stage of the data lifecycle is documented, from the field mission to storage in the repository.

The images are deposited in Nakala, where they are assigned a DOI. Catalogs or pre-publications are deposited in HAL. The links established between these objects enable readers to navigate from the field report to the dataset and then to the scientific publication. This process not only makes the final result visible, but also the process of knowledge production.

A striking example of this is the documentation of a poorly catalogued collection in a university museum. By publishing a pre-publication catalogue and depositing the corresponding images, the project renders these resources findable and reusable by other researchers, extending its own scientific objectives. The data thus becomes a scientific object in its own right, embedded in a network of explicit relationships.

A collective challenge

Discussions with participants revealed that this interconnection work affects more than just HAL and Nakala. As interoperability standards become more widespread, other repositories are expected to gradually join this system. While the technical aspect is important, it is equally crucial that scientific communities take ownership of the system.

Linking publications, data and software enables research to be documented in depth. This involves showcasing not only the results, but also the materials and methods that made them possible. In this sense, the link between publications and data is one of the foundations of sustainable, fully integrated open science.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.