Artificial intelligence, scientific publications and open science

Written by Léo Raimbault

The field of research is undergoing a profound transformation, driven by artificial intelligence (AI). These technologies are revolutionising research methods, data management and even the writing of scientific papers. Tasks that used to take weeks can now be completed in hours thanks to powerful AI tools, demonstrating their revolutionary impact.

AI offers many benefits: generating new ideas, analysing data faster, reducing human error and improving the accuracy of results. These advances are increasing both productivity and reliability in the academic world.

This was the backdrop for the scientific day organised as part of the fourth HAL Partners’ Assembly in 2024. The day explored the various uses of AI in the fields of research, conservation and scientific publication, while questioning the ethical and legal implications of these practices.

The day was made possible thanks to the commitment of a scientific committee set up for the occasion, made up of leading figures in their fields:

  • Férédéric Bousefsaf, lecturer at the University of Lorraine and HAL ambassador
  • Cécile Méadel, Professor and Vice-President of Digital Studies at the University of Paris-Panthéon-Assas
  • Claire Nédellec, Director of Computer Science Research at the INRAE and the University of Paris-Saclay
  • Laurent Romary, research director at Inria

The day was recorded and you can discover the introductory part and the first session below:

The other replays (audio and subtitles in French) are available on the CCSD’s Canal U channel.

Different use cases for AI

As an introduction to the study day, four flash presentations explored different uses of artificial intelligence. Each presentation highlighted a specific use case, illustrating how AI can transform library research and organisational practices. From analysing vast amounts of data to optimising documentary resources, these concrete examples will provide an insight into the diversity of applications and the benefits they bring to the daily lives of professionals.

Marie-Sophie Bercegeay, an expert in electronic resources at the Royal Library of Belgium, presented an automated cataloguing project: by simply capturing the title page, AI made it possible to integrate the references of 3 million books into the library’s online catalogue.

Cyril Labbé, senior lecturer at the University of Grenoble Alpes, presented SciDetect, a tool capable of detecting scientific articles automatically generated by artificial intelligence. It is based on the analysis of “tortured sentences” and incoherent paraphrases (e.g. “artificial intelligence” becomes “false consciousness”), which are characteristic of automatically generated texts.

Géraldine Geoffroy, founder of SmartBibl.IA Solutions, presented her work on automated disciplinary indexing of scientific publications. These methods were used to generate indicators for a local barometer of publication openness, even before the launch of the national Open Science Barometer.

Sophie Schbath, Research Director at INRAE, presented Omnicrobe, an open access database that centralises structured information on the habitats, phenotypes and uses of bacteria. Based on automatic language processing, Omnicrobe extracts and standardises data from a variety of textual sources, making it possible to identify relationships between bacterial habitats and their potential uses.

Based on the use cases presented earlier, Pierre Senellart, Professor of Computer Science at the École Normale Supérieure, explored the main principles of various artificial intelligence models and discussed the theoretical foundations of these technologies, how they work, their capabilities and their limitations.

A round table discussion

This round table brought together four experts to discuss the ethical and legal issues raised by Artificial Intelligence in research and libraries. The use of AI inevitably raises many questions and dilemmas relating to rights, scientific integrity, responsibility and, more generally, the impact of its use.

While AI remains a beneficial tool in many ways, it is crucial to consider the potential risks of copyright infringement when using AI-generated texts. To date, AI tools do not meet authorship standards, as they cannot be held legally responsible for the quality and validity of the information they produce. This also raises questions about the accuracy of the (re)generated information.

As a result, the widespread use of AI means that we need to think about new legal and ethical frameworks that would protect the work of researchers while guarding against potential misuse.

  • Mélanie Clément-Fontaine, Professor of Private Law at the University of Versailles Saint-Quentin-en-Yvelines
  • Liane Huttner, Lecturer in Private Law and Criminal Law at the University of Paris-Saclay
  • Cyril Labbé, Lecturer in Computer Science at the University of Grenoble Alpes
  • Catherine Tessier, Scientific Integrity and Research Ethics Officer at the Office National d’Études et de Recherches Aérospatiales (French Aerospace Research Agency)

Using AI for CORE-GPT and HAL

The afternoon was devoted to exploring the use of AI in structures such as CORE-GPT and analysing concrete applications of AI within HAL.

Participants in this scientific day were privileged to welcome David Pride, Associate Researcher and member of the CORE team, from London. He presented how CORE GPT facilitates the automation of article searches, the synthesis of complex works and the generation of relevant answers from a large scientific knowledge base. The aim of the CORE project is to index and provide unrestricted access to all open access research worldwide.

To conclude the two-day meeting, Nathalie Fargier and Yannick Barborini presented the use of artificial intelligence in the HAL platform. Yannick Barborini highlighted the integration of GROBID, a key tool that automates the extraction and structuring of data from PDF submissions. Capable of processing both traditional metadata (title, authors, affiliations, DOI, etc.) and complex text structures (paragraphs, references, captions, etc.), GROBID improves the quality and consistency of data in HAL. In this way, it simplifies the work of depositors and strengthens interoperability with other systems.

 

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.