Optimising the HAL services with the use of the ROR identifier for the authors’ affiliations

HALSubscribe

Written by CCSD

There are unique identifiers for organisations, similar to ORCID or idHAL for researchers, that are used in information systems to resolve ambiguities and link information. The ROR is one such identifier, dedicated to research organisations. Enriching the HAL database with this identifier enhances the interoperability of the platform and improves a service such as automatic affiliation when depositing. It is in this context that an enrichment campaign was recently carried out with the help of the portals administrators and the collection managers.

What is the ROR ?

The Research Organization Registry (ROR) is a collaborative database containing identifiers and metadata for more than 102,000 organisations involved in research. The ROR identifier is used in a lot of environments where the issue of associating authors and their affiliation is crucial, such as a publisher, a bibliographic database and, of course, an open repository such as HAL.

The ROR is one of four international open infrastructures supported this year by the French National Fund for Open Science.

There are currently 1,299 institutions and organisations identified with a ROR in HAL’s reference data (auréHAL).

Adding the ROR in the metadata describing the research structures in auréHAL makes it possible – as any other identifier – to improve interoperability between HAL data and other sources of bibliographic metadata. This in turn enhances its ability to be discovered and retrieved by other systems and optimises the performance of services that rely on the identification of research structures (API and automatic affiliation of authors).

Automatic affiliation as a use case

Not all publishers manage affiliations with this identifier, but when they do, they can add this information in the metadata associated with the document’s DOI identifier. When extracting the metadata associated with the DOI during the deposit, HAL can thus retrieve the ROR and check if it does not already exist in its database.

ROR alignment

If it already exists in HAL database, it is then possible to automatically propose a reliable research structure for the author’s affiliation. Hence the utility of having as many identified research structures as possible.

Collaborative work for enrichment

When making curation in the research structures reference data, portal administrators can manually add this ROR identifier. For its part, the CCSD regularly carries out automatic enrichments, particularly in partnership with ABES for authors’ IdRef and ORCID identifiers. It is now furthering this work of alignment with research structures.

The CCSD carried out in March-April a call of participation to an enrichment campaign intended to portals administrators and collections managers: from a list of institutions cross-referencing auréHAL data and ROR data, they were invited to verify and validate a proposed ROR. If the proposed ROR was not valid, they were asked to provide the correct one. This human verification was an essential step before any automatic import into the database.

The CCSD thanks everyone who participated and enabled the addition of 314 ROR identifiers to auréHAL!

This figure may seem low, but it hides another much more substantial one which is the number of updates on the deposits: indeed, all the deposits that contain the name of one of the concerned research structures have been updated with this information , thus improving their discoverability (enriching the TEI, the API, the various exports, etc.). With institutions covering a wide scope, nearly 3 million updates have been made on deposits*.

Let’s share the methodologies…

The data extraction method was also evaluated thanks to this campaign: of the 329 structures checked manually, only 19 were declared invalid. The errors were often related to the names of the structures auréHAL (e.g. name of the laboratory mixed with its supervisory institution) or to absent structure in ROR.

The ROR data extraction method relies on an automatic alignment method based on Elasticsearch and developed by the French open science Monitor (BSO) team and described here. In order to find a ROR matching a research structure in auréHAL, the tool performs an approximate comparison between the auréHAL metadata on the one hand, and the ROR metadata on the other.

The CCSD and BSO teams have analysed the results and feedback from this campaign, and thus identified ways to improve and adjust the alignment tool, for example by taking into account the geographical comparisons and the URLs of the organisations’ websites to better identify correlations between auréHAL and ROR. Improving the tool greatly benefits the entire Higher Education and Research community.

This work of alignment for the research structures will continue, now at the laboratory level with the RSNR identifier (National Directory of Research Structures). All of this alignment work is part of HALiance project work package 3, dedicated to the extraction and alignment of metadata.

*a deposit can be updated several times, so adding up the deposits does not reflect the reality. Update log: ROR update of 124 institutions (1,358,092 re-indexed documents); ROR update of 68 institutions (510,614 re-indexed documents); ROR update of 103 institutions (1,073,426 re-indexed documents) and ROR update of 19 institutions (18,413 re-indexed documents).

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.