Automatic extraction of the research infrastructures from the file: an easy way to complete your deposit

Written by Agnès Magron

The HAL submission form has been enhanced with a new metadata field that allows you to specify which research infrastructure was used during your research. HAL takes another step forward by now offering data entry assistance based on information extracted from the uploaded PDF file.

As a reminder, research infrastructures are certified by the Ministry of Higher Education and Research as part of the National Roadmap.

The list of infrastructures available to complete the submission is a closed list derived from data available on data.gouv. Data entry in the submission form is facilitated by an auto-completion feature: simply enter the acronym or part of the name, then select the appropriate option. Since the introduction of this feature last December, nearly 2,800 submissions have been completed with this information.

Extract of research infrastructure mentions present in the PDF file

The application, which already extracts descriptive metadata such as authors, titles, abstracts, journal titles and ANR funding, has been updated to also extract research infrastructures.

If one of these infrastructures is mentioned in the uploaded file (in the acknowledgements or funding sections), the application extracts the reference and checks whether the infrastructure is listed in the dataset available on data.gouv. If both conditions are met, the submission form is automatically completed.

As with any information extracted from the file to populate the metadata, the depositor is prompted to verify before finalising the submission.

The aim of this enhancement is to simplify the submission process. It is part of the Equipex+ HALiance project, specifically Work Package 3, which aims to extract metadata and identifiers from uploaded files and automatically enrich the HAL database. The CCSD is working with Science-Miner, a company that develops open source tools for scientific text mining.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.