Risk cardiovascular (CV) scores developed to estimate patient’s Cardiovascular risk for a period of 10 years have a suboptimal predictive capacity and so, they don’t solve the problem of accurately identifying a patient at risk. The goal of this work is to identify patients who are at the border between a normal state and a more serious state of cardiovascular disease in order to monitor and control these patients with high priority. For this purpose, an unsupervised method (TDA Mapper) was used. TDA Mapper provides a data continuous topography representation in the form of a network of node and edges. In order to optimize such representation, a semi-supervised approach was used for data projection, a clustering algorithm, evaluated with Information Criteria, and enrichment analysis. These approaches have enabled to identify immediately nodes, defined as "frontier" nodes, located at the border between nodes populated by healthy patients and nodes populated by sick patients. Data were evaluated in relation to 4 types of classes and only for the Nyha class (yes/no) were able to identify the "frontier" nodes and to assess the patients contained within, based on some significant characteristics such as smoking, hypertension and hypercholesterolemia. Thanks to the flexibility of the TDA Mapper method, subtle and important data aspects emerged. Such aspects would otherwise be lost if an unsupervised rigid method such as clustering were used.
I punteggi di rischio cardiovascolare (CV) sviluppati per stimare il rischio CV di un paziente in un arco temporale di 10 anni hanno una capacità predittiva non ottimale e dunque, non risolvono il problema di identificare accuratamente un individuo “a rischio”. L’obiettivo di questo lavoro è individuare i pazienti che si trovano al confine tra uno stato normale ed uno stato più grave della malattia cardiovascolare al fine di monitorare e visitare per prima questi pazienti. Per tale scopo è stato utilizzato un metodo non supervisionato (TDA Mapper) il quale fornisce una rappresentazione topografica continua dei dati sottoforma di rete composta da nodi e archi. Al fine di ottimizzare la rappresentazione è stato utilizzato un approccio semi-supervisionato, per la proiezione dei dati, un algoritmo di clustering, valutato con i Criteri di Informazione, e l’enrichment analysis. Questi approcci hanno permesso di identificare subito i nodi, definiti di “frontiera”, situati al confine tra i nodi popolati da pazienti sani e i nodi popolati da pazienti malati. I dati sono stati valutati in relazione a 4 tipologie di classi e soltanto per la classe Nyha (0,1) si sono riusciti ad individuare i nodi di “frontiera” e valutare i pazienti contenuti all’interno, sulla base di alcune caratteristiche significative quali fumo, ipertensione e ipercolesterolemia. Grazie alla flessibilità del metodo TDA Mapper riescono ad emergere aspetti sottili ed importanti dei dati i quali andrebbero altrimenti persi se venisse utilizzato un metodo rigido non supervisionato come il clustering.
Stratificazione di pazienti affetti da aterosclerosi coronarica con approcci di analisi topologica.
ANTONUCCI, VALENTINA
2019/2020
Abstract
Risk cardiovascular (CV) scores developed to estimate patient’s Cardiovascular risk for a period of 10 years have a suboptimal predictive capacity and so, they don’t solve the problem of accurately identifying a patient at risk. The goal of this work is to identify patients who are at the border between a normal state and a more serious state of cardiovascular disease in order to monitor and control these patients with high priority. For this purpose, an unsupervised method (TDA Mapper) was used. TDA Mapper provides a data continuous topography representation in the form of a network of node and edges. In order to optimize such representation, a semi-supervised approach was used for data projection, a clustering algorithm, evaluated with Information Criteria, and enrichment analysis. These approaches have enabled to identify immediately nodes, defined as "frontier" nodes, located at the border between nodes populated by healthy patients and nodes populated by sick patients. Data were evaluated in relation to 4 types of classes and only for the Nyha class (yes/no) were able to identify the "frontier" nodes and to assess the patients contained within, based on some significant characteristics such as smoking, hypertension and hypercholesterolemia. Thanks to the flexibility of the TDA Mapper method, subtle and important data aspects emerged. Such aspects would otherwise be lost if an unsupervised rigid method such as clustering were used.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/12849