Natural Language Processing per la Generazione Automatica di Documentazione dei Processi Aziendali

In today’s data-driven era, Artificial Intelligence (AI) plays a crucial role in enabling organizations to analyze and extract value from vast amounts of data generated daily. This thesis focuses on Natural Language Processing (NLP), a subfield of AI concerned with the interaction between computers and human language. Recent advances in NLP have significantly improved the processing and analysis of unstructured textual data, enhancing how information is accessed and interpreted. The aim of this work is to develop a documentation-generating agent capable of understanding the components of a business application workflow and answering user questions. The study begins with an overview of eLegere, an Italian low-code/no-code platform for Business Process Management, outlining its main features and architecture. It then introduces the mathematical foundations of NLP, focusing on the bag-of-words representation and the Na\"ive Bayes method for text classification. Subsequently, the thesis examines transformer-based models, which represent the current state of the art in NLP. Building on these concepts, a model architecture for automated documentation generation is presented, with particular attention to the tool-calling approach, prompt design, and typical response workflows. Finally, the performance of selected large language models (LLMs) is evaluated on a dedicated dataset, and the results are discussed with a view toward potential future developments.

Nell’attuale era guidata dai dati, l’Intelligenza Artificiale (AI) svolge un ruolo cruciale nel consentire alle organizzazioni di analizzare ed estrarre valore dalle vaste quantità di dati generate quotidianamente. Questa tesi si concentra sul Natural Language Processing (NLP), un sottoinsieme dell’AI che studia l’interazione tra computer e linguaggio umano. I recenti progressi nel NLP hanno migliorato significativamente l’elaborazione e l’analisi dei dati testuali non strutturati, potenziando il modo in cui le informazioni vengono accessibili e interpretate. L’obiettivo di questo lavoro è sviluppare un agente in grado di generare documentazione, capace di comprendere i componenti di un flusso applicativo aziendale e di rispondere alle domande degli utenti. Lo studio inizia con una panoramica di eLegere, una piattaforma italiana low-code/no-code per il Business Process Management, illustrandone le principali funzionalità e l’architettura. Vengono quindi introdotti i fondamenti matematici del NLP, con particolare attenzione alla rappresentazione bag-of-words e al metodo di Naïve Bayes per la classificazione dei testi. Successivamente, la tesi analizza i modelli basati su transformer, che rappresentano lo stato dell’arte nel campo del NLP. Sulla base di questi concetti, viene presentata un’architettura di modello per la generazione automatica della documentazione, con particolare attenzione all’approccio di tool-calling, alla progettazione dei prompt e ai flussi tipici di risposta. Infine, le prestazioni di alcuni modelli linguistici di grandi dimensioni (LLM) vengono valutate su un dataset dedicato e i risultati sono discussi in vista di possibili sviluppi futuri.