This work falls within the scope of Process Outcome Research, a field of research that investigates processes and outcomes to understand which treatment is most effective for a specific person with a specific problem and under what circumstances. Furthermore, in psychotherapy research, artificial intelligence is assuming an increasingly important role, especially through the application of Machine Learning and Natural Language Processing systems. In this context, the specific objective of this work was to investigate three different Topic Modeling algorithms, namely LDA (Latent Dirichlet Allocation), Top2Vec, and BERTopic, and to compare them to understand which of the three best suited the analysis of therapeutic transcripts from individual therapies. The case analyzed in this research consists of 28 sessions conducted using a cognitive neuropsychological approach. The sessions were audio and video recorded, transcribed on a Microsoft Word document following the transcription standards of Mergenthaler and Stinson (1992), and subsequently, a Microsoft Excel worksheet was created, on which the topics were labeled. The topic analyses were carried out following the phenomenological method of Giorgi (1985). The hypotheses that guided the research consist of the possibility that the three models are capable of extracting topics that can be defined with the labels developed during previous work and that they identify new topics. Additionally, a further hypothesis concerns Top2Vec and BERTopic, which, based on their functioning, are believed to offer more specific and comprehensible results. The results extracted from the models allowed for both qualitative and quantitative data analysis, offering important points to reflect on. Specifically, it emerged that the labels implemented in previous work are applicable to the topics extracted by the three models, and BERTopic was able to identify both very specific topics and two topics not previously encountered, one related to the temporal dimension and the other to geographical aspects. Unexpectedly, Top2Vec did not perform as expected. Despite the limitations of this study, there are many elements that allow us to affirm how artificial intelligence systems can positively contribute to research in psychotherapy and to the psychotherapeutic practice for the individual.
Il presente lavoro si inserisce nell’ambito della Process Outcome Research, campo di ricerca che indaga i processi e i risultati per comprendere quale trattamento è più efficace per una specifica persona con uno specifico problema e in quali circostanze. Inoltre, nella ricerca in psicoterapia, l’intelligenza artificiale sta assumendo un ruolo sempre più importante tramite l’applicazione di sistemi di Machine Learning e Natural Language Processing. In questo contesto, tale lavoro si è posto come obiettivo specifico l’indagine di tre algoritmi di Topic Modeling differenti, LDA (Latent Dirichlet Allocation), Top2Vec e BERTopic, e il loro confronto, per capire quale dei tre si adattasse maggiormente all’analisi di trascritti terapeutici di singole terapie. Il caso analizzato in questa ricerca è costituito da 28 colloqui condotti con un approccio cognitivo neuropsicologico. I colloqui sono stati audio e video registrati, trascritti su un foglio Microsoft Word rispettando gli standard di trascrizione di Mergenthaler e Stinson (1992) e successivamente è stato creato un foglio di lavoro Microsoft Excel sul quale sono stati etichettati i topic; le analisi sui topic sono state eseguite seguendo il metodo fenomenologico di Giorgi (1985). Le ipotesi che hanno guidato la ricerca consistono nella possibilità che i tre modelli siano in grado di estrarre topic definibili con le etichette sviluppate nel corso dei lavori precedenti e che individuino nuovi topic. Inoltre, un’ulteriore ipotesi riguarda Top2Vec e BERTopic che, sulla base del loro funzionamento, si pensa possano offrire risultati più specifici e comprensibili. I risultati estratti dai modelli hanno consentito di effettuare una analisi sui dati sia qualitativa che quantitativa, fornendo importanti punti di riflessione. Nello specifico, è emerso che le etichette implementate nei lavori precedenti risultano applicabili ai topic estratti dai tre modelli e che BERTopic è stato in grado di individuare sia topic molto specifici che due topic mai riscontrati precedentemente, uno inerente alla dimensione temporale e l’altro ad aspetti geografici. Inaspettatamente Top2Vec non ha funzionato come ci si aspettava. Nonostante le limitazioni che tale studio presenta, vi sono molti elementi che permettono di affermare come i sistemi di intelligenza artificiale possano contribuire positivamente alla ricerca in psicoterapia e alla pratica psicoterapeutica per il singolo individuo.
Tre modelli di topic modeling a confronto per 'analisi dei processi in una psicoterapia conclusa: LDA, Top2Vec e BERTopic. Un contributo alla Process Outcome Research
SACCHETTI, SARA
2022/2023
Abstract
This work falls within the scope of Process Outcome Research, a field of research that investigates processes and outcomes to understand which treatment is most effective for a specific person with a specific problem and under what circumstances. Furthermore, in psychotherapy research, artificial intelligence is assuming an increasingly important role, especially through the application of Machine Learning and Natural Language Processing systems. In this context, the specific objective of this work was to investigate three different Topic Modeling algorithms, namely LDA (Latent Dirichlet Allocation), Top2Vec, and BERTopic, and to compare them to understand which of the three best suited the analysis of therapeutic transcripts from individual therapies. The case analyzed in this research consists of 28 sessions conducted using a cognitive neuropsychological approach. The sessions were audio and video recorded, transcribed on a Microsoft Word document following the transcription standards of Mergenthaler and Stinson (1992), and subsequently, a Microsoft Excel worksheet was created, on which the topics were labeled. The topic analyses were carried out following the phenomenological method of Giorgi (1985). The hypotheses that guided the research consist of the possibility that the three models are capable of extracting topics that can be defined with the labels developed during previous work and that they identify new topics. Additionally, a further hypothesis concerns Top2Vec and BERTopic, which, based on their functioning, are believed to offer more specific and comprehensible results. The results extracted from the models allowed for both qualitative and quantitative data analysis, offering important points to reflect on. Specifically, it emerged that the labels implemented in previous work are applicable to the topics extracted by the three models, and BERTopic was able to identify both very specific topics and two topics not previously encountered, one related to the temporal dimension and the other to geographical aspects. Unexpectedly, Top2Vec did not perform as expected. Despite the limitations of this study, there are many elements that allow us to affirm how artificial intelligence systems can positively contribute to research in psychotherapy and to the psychotherapeutic practice for the individual.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/3390