The thesis proposes a unique and reproducible pipeline to evaluate mortgage approval models on the 2017 HMDA dataset from the States of California and New York, using the bank’s historical decision as the target variable. Several machine learning models are compared—Logistic Regression, MLP (Neural Network), Random Forest, LightGBM, and XGBoost—to identify which best captures the patterns of our imbalanced dataset, characterized by many approved and few denied loans. The shared pipeline ensures a fair and immediate comparison across models. In addition to conventional metrics (accuracy, precision, recall, ROC-AUC), SAFE AI diagnostics (RGA, RGE, RGR) are used to assess where the ranking is informative, how stable it is, and how explainable it remains. These metrics are further extended with a new one, RGAp, designed to evaluate residual accuracy under targeted data loss. Next, the analysis focuses on identifying “unjustified denials” through the False Positive Rate (FPR). To make these errors measurable and comparable, a new threshold is introduced—chosen on the validation set using the Youden index—instead of the fixed 0.50 threshold adopted in the common pipeline. This approach ensures a realistic share of applications classified as denied. Subsequently, differences in FPR are estimated across race, income, gender, and ethnicity groups, and a transparent corrective procedure is tested to align FPR values. Finally, to simulate an economic downturn scenario, a counterfactual stress test is applied to the best-performing model (XGBoost) by decreasing income and increasing the requested loan amount only in the test data, assessing how calibration and stability change under a fixed model. The contribution is a coherent framework that links performance, fairness, and robustness, offering a governance-oriented approach to mortgage approval modeling.
La tesi propone una pipeline unica e riproducibile per valutare modelli di approvazione dei mutui sul dataset HMDA 2017 di California e New York, usando come target la decisione storica della banca. Confrontiamo diversi modelli di machine learning: Logistic Regression, MLP (Neural Network), Random Forest, LightGBM e XGBoost per identificare quale di questi cattura meglio i pattern del nostro sbilanciato, caratterizzato da molti mutui approvati e pochi non concessi; la pipeline comune adottata rende il confronto tra i modelli immediato e corretto. Oltre alle metriche convenzionali (accuracy, precision, recall, ROC-AUC) usiamo diagnostiche SAFE (RGA, RGE, RGR) per capire dove il ranking è informativo, quanto è stabile e quanto è spiegabile. Queste metriche verranno successivamente integrate da una nuova metrica in particolare: RGAp, per valutare l’accuratezza residua sotto perdita mirata di dati. Come passo successivo, ci concentriamo in particolare sul catturare i “dinieghi ingiustificati” tramite il tasso di falsi positivi (FPR). Per rendere visibili e misurabili questi errori, invece della soglia 0.50, che era stata adottata nella common pipeline, adottiamo una nuova soglia scelta su validation con l’indice di Youden, così da ottenere un volume realistico di pratiche classificate come negate; quindi stimiamo le differenze tra razza, reddito, genere ed etnia e testiamo una correzione operativa trasparente che allinei FPR. Infine, per simulare uno scenario di crisi economica, eseguiamo uno stress test controfattuale sul modello migliore (XGBoost) riducendo il reddito e aumentando l’importo richiesto solo nei dati di test, valutando come cambiano calibrazione e stabilità a modello fisso. Il contributo è un percorso coerente che collega performance, equità e robustezza e che può guidare la governance di modelli di approvazione dei mutui.
Integrando le Metriche SAFE AI nel contesto dei mutui ipotecari: Performance, Equità e Stabilità sotto scenari di stress
RIGAMONTI, CHIARA
2024/2025
Abstract
The thesis proposes a unique and reproducible pipeline to evaluate mortgage approval models on the 2017 HMDA dataset from the States of California and New York, using the bank’s historical decision as the target variable. Several machine learning models are compared—Logistic Regression, MLP (Neural Network), Random Forest, LightGBM, and XGBoost—to identify which best captures the patterns of our imbalanced dataset, characterized by many approved and few denied loans. The shared pipeline ensures a fair and immediate comparison across models. In addition to conventional metrics (accuracy, precision, recall, ROC-AUC), SAFE AI diagnostics (RGA, RGE, RGR) are used to assess where the ranking is informative, how stable it is, and how explainable it remains. These metrics are further extended with a new one, RGAp, designed to evaluate residual accuracy under targeted data loss. Next, the analysis focuses on identifying “unjustified denials” through the False Positive Rate (FPR). To make these errors measurable and comparable, a new threshold is introduced—chosen on the validation set using the Youden index—instead of the fixed 0.50 threshold adopted in the common pipeline. This approach ensures a realistic share of applications classified as denied. Subsequently, differences in FPR are estimated across race, income, gender, and ethnicity groups, and a transparent corrective procedure is tested to align FPR values. Finally, to simulate an economic downturn scenario, a counterfactual stress test is applied to the best-performing model (XGBoost) by decreasing income and increasing the requested loan amount only in the test data, assessing how calibration and stability change under a fixed model. The contribution is a coherent framework that links performance, fairness, and robustness, offering a governance-oriented approach to mortgage approval modeling.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_definitiva_upload (1).pdf
non disponibili
Descrizione: Tesi (in lingua inglese)
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Richiedi una copia |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/31909