Revising the clinical criteria for Dementia using explainable machine learning and counterfactuals.

Background: The clinical protocols for Dementia are based on international working groups that decide on the definition of the diseases based on their clinical experience and existing biomarkers. With the presence of publicly open datasets, there is the possibili have made possible the research for early markers for the various Dementia phenotypes. A research area of these markers is based on the machine learning models of diagnosis of Dementia. Objective: This thesis uses the explainability analysis of computational models to investigate whether the clinical criteria defined by the clinicians can be replicated based on the visiting data of the patients. Then the boundary cases are studied and defined through counterfactual instances. Design: First the clinical criteria for the main phenotypes of Dementia are presented. It includes the Mild Cognitive Impairment (MCI), Alzheimer’s Disease (AD), Fronto-Temporo Lobe Degeneration (FTLD), Dementia with Movement Disorders (DwMD), and Vascular Dementia (VaD). Then a processing pipeline was defined, with possible steps that include imputation, scaling, imbalance, and model optimization. The datasets were tested using different machine learning models like RandomForest, XGBoost, and LightGBM. Setting: Two multi-center open datasets were mainly used for development and internal validation of the models: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and National Alzheimer’s Coordinating Center (NACC) dataset. These two datasets include multiple visits for heterogenous groups of Dementia patients. They contain the data extracted during the clinical interview, questionnaires and assessments, and the diagnostic process. In one case (ADNI) processed neuroimaging data was also included.There are 45100 participants in the NACC dataset, and 2294 participants in the ADNI dataset. Main outcome measure: The weighted harmonic mean of recall and precision (f-beta-score) measured based on a Stratified-Group-k-Fold validated split, with k=4 for a 75-25% train-test split. Main preliminary results: 1. The clinical criteria defined by the protocols are replicated to a large extent through computational models. 2. Advanced preprocessing steps do not add predictive value for gradient boosted algorithms. 3. LightGBM model outperforms the other models in a tight margin. 4. The diagnostic models are more sensitive to disease groups like FTLD and DwMD, while the MCI-VaD-AD spectrum is more difficult to separate.

Revising the clinical criteria for Dementia using explainable machine learning and counterfactuals.

CALLIKU, DOREN

2021/2022

Abstract

Background: The clinical protocols for Dementia are based on international working groups that decide on the definition of the diseases based on their clinical experience and existing biomarkers. With the presence of publicly open datasets, there is the possibili have made possible the research for early markers for the various Dementia phenotypes. A research area of these markers is based on the machine learning models of diagnosis of Dementia. Objective: This thesis uses the explainability analysis of computational models to investigate whether the clinical criteria defined by the clinicians can be replicated based on the visiting data of the patients. Then the boundary cases are studied and defined through counterfactual instances. Design: First the clinical criteria for the main phenotypes of Dementia are presented. It includes the Mild Cognitive Impairment (MCI), Alzheimer’s Disease (AD), Fronto-Temporo Lobe Degeneration (FTLD), Dementia with Movement Disorders (DwMD), and Vascular Dementia (VaD). Then a processing pipeline was defined, with possible steps that include imputation, scaling, imbalance, and model optimization. The datasets were tested using different machine learning models like RandomForest, XGBoost, and LightGBM. Setting: Two multi-center open datasets were mainly used for development and internal validation of the models: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and National Alzheimer’s Coordinating Center (NACC) dataset. These two datasets include multiple visits for heterogenous groups of Dementia patients. They contain the data extracted during the clinical interview, questionnaires and assessments, and the diagnostic process. In one case (ADNI) processed neuroimaging data was also included.There are 45100 participants in the NACC dataset, and 2294 participants in the ADNI dataset. Main outcome measure: The weighted harmonic mean of recall and precision (f-beta-score) measured based on a Stratified-Group-k-Fold validated split, with k=4 for a 75-25% train-test split. Main preliminary results: 1. The clinical criteria defined by the protocols are replicated to a large extent through computational models. 2. Advanced preprocessing steps do not add predictive value for gradient boosted algorithms. 3. LightGBM model outperforms the other models in a tight margin. 4. The diagnostic models are more sensitive to disease groups like FTLD and DwMD, while the MCI-VaD-AD spectrum is more difficult to separate.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI SCIENZE DEL SISTEMA NERVOSO E DEL COMPORTAMENTO
			
	Corso di studio
	
				PSYCHOLOGY, NEUROSCIENCE AND HUMAN SCIENCES [05416]
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				Revising the clinical criteria for Dementia using explainable machine learning and counterfactuals.
			
	Abstract in italiano
	
				Background: The clinical protocols for Dementia are based on international working groups that decide on the
definition of the diseases based on their clinical experience and existing biomarkers. With the presence of publicly
open datasets, there is the possibili have made possible the research for early markers for the various Dementia
phenotypes. A research area of these markers is based on the machine learning models of diagnosis of Dementia.

Objective: This thesis uses the explainability analysis of computational models to investigate whether the clinical
criteria defined by the clinicians can be replicated based on the visiting data of the patients. Then the boundary
cases are studied and defined through counterfactual instances.

Design: First the clinical criteria for the main phenotypes of Dementia are presented. It includes the Mild
Cognitive Impairment (MCI), Alzheimer’s Disease (AD), Fronto-Temporo Lobe Degeneration (FTLD), Dementia
with Movement Disorders (DwMD), and Vascular Dementia (VaD). Then a processing pipeline was defined, with
possible steps that include imputation, scaling, imbalance, and model optimization. The datasets were tested using
different machine learning models like RandomForest, XGBoost, and LightGBM.

Setting: Two multi-center open datasets were mainly used for development and internal validation of the models:
the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and National Alzheimer’s Coordinating Center
(NACC) dataset. These two datasets include multiple visits for heterogenous groups of Dementia patients. They
contain the data extracted during the clinical interview, questionnaires and assessments, and the diagnostic process.
In one case (ADNI) processed neuroimaging data was also included.There are 45100 participants in the NACC
dataset, and 2294 participants in the ADNI dataset.

Main outcome measure: The weighted harmonic mean of recall and precision (f-beta-score) measured based
on a Stratified-Group-k-Fold validated split, with k=4 for a 75-25% train-test split.

Main preliminary results:
1. The clinical criteria defined by the protocols are replicated to a large extent through computational models.
2. Advanced preprocessing steps do not add predictive value for gradient boosted algorithms.
3. LightGBM model outperforms the other models in a tight margin.
4. The diagnostic models are more sensitive to disease groups like FTLD and DwMD, while the MCI-VaD-AD
spectrum is more difficult to separate.
			
	Relatore
	
				SALVATO, GERARDO
			
	Correlatore
	
				BOTTINI, GABRIELLA
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/2422