The fight against diabetes is one of the three health emergencies identified by the United Nations Organization (ONU) and the World Health Organization (WHO), after malaria and tuberculosis. Above all, type 2 diabetes which represents about 90% of cases in Italy is growing. Also in our country it is strongly linked to excess weight, because of overeating and poor physical activity. In Italy there were about 1.5 million known cases of diabetes in 1985 but now they are close to 4 million: in30 years they have more than doubled, reaching approximately one case per 16 residents. This thesis aims to develop classification models of diabetic patients based on atextual analysis of their medical reports. Text Mining is a relatively recent research area and it has the main objective to obtain automaticallyinformation from digital textual resources. In order to take advantage of this type of unstructured data, it was necessary to resort to Natural Language Processing algorithms, which, thanks to recent developments in Deep Learning techniques, have become an excellent tool capable of managing very large and unstructured databases. The results obtained will showthat within the text in the reports it is possible to find information which gives a valid contribution both to the classification based on the totality of the reports and to the prediction of some categories of comorbidity. Classifiers based on Text Mining techniques can therefore extract interesting information, both to automatically analyze medicalreports and to build predictive modelswhich combine quantitative data and unstructured information. Thanks to the kind collaboration of the Maugeri Scientific Clinical Institutes (ICSM) of Pavia, 17694 reports belonging to 9.727 diabetic patients were used for this study.
La lotta al diabete è una delle tre emergenze sanitarie identificate dall’Organizzazione delle Nazioni Unite (ONU) e dall’Organizzazione Mondiale della Sanità (OMS), insieme alla malaria e alla tubercolosi. Cresce soprattutto il diabete di tipo 2, che rappresenta circa il 90% dei casi in Italia, fortemente legato, anche nel nostro Paese, all’eccesso ponderale, a sua volta riferibile a iperalimentazione e a scarsa attività fisica. In Italia i casi noti di diabete nel 1985 erano circa 1,5 milioni ma ora si avvicinano ai 4 milioni: nell’arco di 30 anni sono più che raddoppiati arrivando all’incirca a un caso ogni 16 residenti. Questa tesi si pone come obiettivo quello di sviluppare dei modelli per la classificazione di pazienti diabetici basandosi sull’analisi testuale dei loro referti. Il Text Mining è un ambito di ricerca relativamente recente, e ha come principale obiettivo il ricavare informazioni automaticamente da risorse testuali digitali. Per poter usufruire di questa tipologia di dati non strutturati si è dovuto ricorrere ad algoritmi di Natural Language Processing che, grazie ai recenti sviluppi delle tecniche di Deep Learning, sono diventati un ottimo strumento in grado di gestire database molto ampi e destrutturati. I risultati ottenuti hanno dimostrato come all’interno del testo presente nei referti sia possibile trovare informazioni in grado di fornire un valido contributo sia alla classificazione basata sulla totalità dei referti che alla predizione di alcune categorie di comorbidità. I classificatori basati sulle tecniche di Text Mining possono quindi estrarre informazioni di interesse, sia per analizzare automaticamente i referti sia per costruire modelli predittivi, che combinino dati quantitativi e informazioni non strutturate. Grazie alla gentile collaborazione dagli Istituti Clinici Scientifici Maugeri (ICSM) di Pavia sono stati utilizzati per questo studio 17694 referti appartenenti a 9727 pazienti diabetici.
Classificazione di pazienti diabetici tramite l'elaborazione del linguaggio naturale di referti clinici
CAMPANELLA, MAURO
2018/2019
Abstract
The fight against diabetes is one of the three health emergencies identified by the United Nations Organization (ONU) and the World Health Organization (WHO), after malaria and tuberculosis. Above all, type 2 diabetes which represents about 90% of cases in Italy is growing. Also in our country it is strongly linked to excess weight, because of overeating and poor physical activity. In Italy there were about 1.5 million known cases of diabetes in 1985 but now they are close to 4 million: in30 years they have more than doubled, reaching approximately one case per 16 residents. This thesis aims to develop classification models of diabetic patients based on atextual analysis of their medical reports. Text Mining is a relatively recent research area and it has the main objective to obtain automaticallyinformation from digital textual resources. In order to take advantage of this type of unstructured data, it was necessary to resort to Natural Language Processing algorithms, which, thanks to recent developments in Deep Learning techniques, have become an excellent tool capable of managing very large and unstructured databases. The results obtained will showthat within the text in the reports it is possible to find information which gives a valid contribution both to the classification based on the totality of the reports and to the prediction of some categories of comorbidity. Classifiers based on Text Mining techniques can therefore extract interesting information, both to automatically analyze medicalreports and to build predictive modelswhich combine quantitative data and unstructured information. Thanks to the kind collaboration of the Maugeri Scientific Clinical Institutes (ICSM) of Pavia, 17694 reports belonging to 9.727 diabetic patients were used for this study.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/23172