Cryptology ePrint Archive and RepositoriUM: Two New Repositories Connected to Episciences

Written by Delphine Crubellier

Two new repositories are now interoperable with Episciences: the Cryptology ePrint Archive and RepositoriUM. These new connections enable Episciences to be compatible with three widely used repository software solutions for data and document management: DSpace, EPrints, and Dataverse.

The Cryptology ePrint Archive is a disciplinary repository used by the mathematics research community. Connecting with this type of repository addresses a need expressed by journals or scientific communities wishing to publish on Episciences – as here, in response to a request from the jGCC journal.

RepositoriUM is the institutional repository of the University of Minho (Portugal). This new partnership was established under the Confederation of Open Access Repositories (COAR) framework and involves the implementation of the COAR Notify protocol. This will allow researchers from the Portuguese institution to submit their preprints to Episciences journals directly from the RepositoriUM repository. The connection is currently undergoing validation and will be available after a final testing phase.

Repositories are added based on the needs expressed by journals or scientific communities, requests from the institutions hosting the repositories, or to expand Episciences’ geographic coverage. These connections strengthen the platform’s impact within the scientific publishing landscape.

An editorial model based on interconnection with open repositories

Episciences’ publishing model relies on its ability to connect with diverse open repositories. The evaluation and publication services offered by the platform are built upon open repositories – following the overlay journal model. Episciences does not host publications; instead, it retrieves documents and their metadata from repositories where they have been deposited. Journals select the servers most relevant to their discipline, while authors retain responsibility for archiving their documents (green open access). This approach, among other benefits, facilitates compliance with the rights retention strategy of Plan S. Episciences is currently connected to arXiv, bioRXiv, medRXiv, HAL, Zenodo, Software Heritage, DarUS, Recherche Data Gouv and Arche.

Technical challenges of connecting platforms: how to link them together?

The connection process depends on the repository and the technical solution it uses, as well as the metadata format it supports. At a minimum, the repository must provide a service URL compatible with at least one standard format, such as Dublin Core, which serves as a common language between Episciences and the repository. Enriched formats (e.g., OpenAIRE) can also be used but are optional.

Regardless of the repository, some aspects of the work remain consistent: the harvesting logic (via standardized protocols like OAI-PMH) and the storage method (in the open repository, with no hosting on Episciences) are the same for all software solutions.

However, certain aspects of the connection require adaptation to the technical behaviors specific to each repository. For example:

  • The analysis and extraction of metadata must adapt to how the repository structures them.
  • The processing and cleaning of metadata depend on the rules and mandatory fields established by the repositories.
  • Version management and the assignment of identifiers to these versions vary across repositories. Episciences adapts to these rules to retrieve, display, and link versions transparently for users.

The main challenges involve version management. The link between versions of the same article is not always explicit, which can lead to risks of duplication, loss of contextual information, and ad hoc processing that may hinder Episciences’ operations. Poor metadata quality or inconsistent identifier management can impede the identification of multiple deposits of the same article, necessitating the development of methods to reconcile them. In some cases, it is necessary to build version management logic within Episciences, which requires significant technical development (connectors, metadata comparison, etc.).

The validation process: engaging the platform’s user communities

In addition to internal testing phases, the process of deploying a new connection involves a pre-production deployment in an environment identical to the production system, accessible to journal editorial teams. They can conduct tests on the entire workflow without risk and suggest improvements to the Episciences team. Once the repository connection feature is deployed in the production environment, and given that Episciences’ code is open, bugs and improvement suggestions can be reported or proposed by journal contributors via the dedicated GitHub space.

Future perspectives: leveraging and optimizing existing developments

The developments carried out so far to connect Episciences with open repositories built on Dataverse, DSpace, and EPrints have made the platform compatible with three widely used solutions for data and document repository management. This compatibility significantly reduces the effort required to integrate other repositories or preprint servers that already use these software solutions. The OAI-PMH harvesting module, which enables standardized queries to repositories and retrieves responses in XML format to detect new deposits and updates, is compatible with all three solutions. The processing and normalization of metadata, which makes them available throughout the publishing workflow (submission, review, publication), are also applicable to all three software platforms.

The connection with open repositories can be further enhanced through the integration of the COAR Notify protocol. This protocol allows infrastructures to notify each other in real time when a research object (article, dataset, etc.) is deposited, updated, or modified, thereby improving interoperability and synchronization between platforms. The protocol will be implemented between RepositoriUM and Episciences. It pushes interoperability further by transforming metadata exchanges into bidirectional processes, adding contextual information to basic metadata, and enabling immediate synchronization.

Thanks to these developments, Episciences now plans to integrate Digital CSIC (the repository managed by CSIC – Consejo Superior de Investigaciones Científicas, Spain – and developed on DSpace) as well as BAOBAB (the repository developed by WACREN – West and Central African Research and Education Network – and built with InvenioRDM).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.