After the Great Global Financial Crisis of 2008, mainly caused by the US subprime mortgage crisis, many market participants began to worry about the use of financial instruments such as Credit Default Swaps, in which the creditworthiness of the counterparty was not adequately assessed . This financial crisis was then followed, within a few years, by the European Sovereign Debt Crisis, and then more recently by the COVID-19 Crisis and then the Energy Crisis due to the Russia-Ukraine war conflict; this series of disruptive events had serious repercussions on the economic and banking system, which led to phenomena such as the scarcity of funding, the deterioration of the real economy on credit quality and the compression of operating margins. In this context, credit risk management has become an extremely important factor attracting significant attention from market participants. To effectively manage risky credit exposures and optimize profits, financial institutions have developed a number of statistical machine learning techniques to build credit rating models. The objective of this work is to present the statistical techniques used for Credit Scoring, focusing on one of the most used models for counterparty credit evaluation. The work is divided into three Chapters. The first chapter is divided into two parts: the first part is dedicated to the introduction of the main concepts of Machine Learning, while the second part analyzes the concept of credit risk. After an initial focus on the definition of credit risk and its management through the analysis of the components (expected and non-expected loss), the chapter's approach presents a clear correlation with the evolution of banking regulation in recent years (from Basel I to Basel III). The second chapter provides a general overview of credit scoring techniques. This is done through a more conceptual description in the first part and more analytical in the second, in order to show both the reasons and the advantages of the use of credit scoring, but above all its effect on credit performance. In fact, classification models such as Logistic Regression or the Naive Bayesian are presented, through the discussion of the various phases of model construction (sample selection, explanatory variables, model estimation and verification of predictive capacity). The third chapter delves into Credit Scoring through the presentation of a specific model and its practical application. The chapter has an analytical and purely applicative character; a sample of European companies classified through the MORE framework is chosen for the application of the previously analyzed models. The choice of SMEs as subjects in the sample is not accidental: above all in Italy, the close relationship between banks and enterprises in fact reflects some structural characteristics of our economy: above all, the fragmentation of the productive fabric into numerous SMEs which are unable to direct access to the capital market. The empirical analysis carried out through the use of the WEKA software and the help of the STATA software aims to practically analyze the relationship between the balance sheet and the macroeconomic indicators chosen as explanatory variables and the classification variable Y (MScore); moreover, the predictive capacity of the model and the significance of the individual regressors are evaluated.
Dopo la Grande Crisi Finanziaria globale del 2008, causata principalmente dalla crisi dei mutui subprime statunitensi, molti operatori di mercato hanno cominciato a preoccuparsi riguardo all'utilizzo di strumenti finanziari come i Credit Default Swap, in cui non era valutato adeguatamente il merito creditizio della controparte. Questa crisi finanziaria è stata poi seguita, nel giro di pochi anni, dalla Crisi del Debito Sovrano Europea, e poi più recentemente dalla Crisi dovuta al COVID-19 e poi la Crisi energetica dovuta al conflitto bellico Russia-Ucraina; questa serie di eventi disruptive hanno avuto serie ripercussioni sul sistema economico e bancario, che hanno portato a fenomeni come la rarefazione del funding, il deterioramento dell'economia reale sulla qualità del credito e la compressione dei margini operativi. In questo contesto, la gestione del rischio di credito è diventata un fattore estremamente importante che attira un'attenzione significativa da parte degli operatori di mercato. Per gestire efficacemente le esposizioni creditizie rischiose e ottimizzare i profitti, le istituzioni finanziarie hanno sviluppato una serie di tecniche statistiche di apprendimento automatico per costruire modelli di rating del credito. L'obiettivo di questo lavoro è di presentare le tecniche statistiche utilizzate per il Credit Scoring, concentrandosi su uno dei modelli più utilizzati per la valutazione creditizia della controparte. Il lavoro è diviso in tre Capitoli. Il primo capitolo è diviso in due parti: la prima parte è dedicata all'introduzione dei concetti principali del Machine Learning, mentre nella seconda parte viene analizzato il concetto di rischio di credito. Dopo un iniziale focus sulla definizione del rischio di credito e sulla sua gestione attraverso l'analisi delle componenti (perdita attesa e non attesa), l'approccio del capitolo presenta una chiara correlazione con l'evoluzione della regolamentazione bancaria degli ultimi anni (dal Basilea I a Basilea III). Il secondo capitolo fornisce una panoramica generale sulle tecniche di Credit Scoring. Ciò viene fatto attraverso una descrizione più concettuale nella prima parte e più analitica nella seconda, al fine di mostrare sia le ragioni che i vantaggi dell'uso del credit scoring, ma soprattutto il suo effetto sulle prestazioni di credito. Infatti, vengono presentati modelli di classificazione come la Regressione logistica o il Naive Bayesiano, attraverso la discussione delle varie fasi della costruzione del modello (selezione del campione, variabili esplicative, stima del modello e verifica della capacità predittiva). Il terzo e ultimo capitolo approfondisce il Credit Scoring attraverso la presentazione di un modello specifico e la sua applicazione pratica. Il capitolo ha un carattere analitico e puramente applicativo; viene scelto un campione di aziende europee classificate attraverso il framework MORE per l'applicazione dei modelli precedentemente analizzati. La scelta delle PMI come soggetto del campione non è casuale: soprattutto in Italia, la stretta relazione tra banche e imprese riflette infatti alcune caratteristiche strutturali della nostra economia: sopra ogni cosa, la frammentazione del tessuto produttivo in numerose PMI che non sono in grado di accedere direttamente al mercato del capitale. L'analisi empirica effettuata attraverso l'uso del software WEKA e l'aiuto del software STATA mira ad analizzare praticamente la relazione tra il bilancio e gli indicatori macroeconomici scelti come variabili esplicative e la variabile di classificazione Y (MScore); inoltre, viene valutata la capacità predittiva del modello e la significatività dei singoli regressori.
Statistical and Machine Learning Model applications in Credit Scoring. Empirical evidences
ARMARI, NICCOLÒ
2021/2022
Abstract
After the Great Global Financial Crisis of 2008, mainly caused by the US subprime mortgage crisis, many market participants began to worry about the use of financial instruments such as Credit Default Swaps, in which the creditworthiness of the counterparty was not adequately assessed . This financial crisis was then followed, within a few years, by the European Sovereign Debt Crisis, and then more recently by the COVID-19 Crisis and then the Energy Crisis due to the Russia-Ukraine war conflict; this series of disruptive events had serious repercussions on the economic and banking system, which led to phenomena such as the scarcity of funding, the deterioration of the real economy on credit quality and the compression of operating margins. In this context, credit risk management has become an extremely important factor attracting significant attention from market participants. To effectively manage risky credit exposures and optimize profits, financial institutions have developed a number of statistical machine learning techniques to build credit rating models. The objective of this work is to present the statistical techniques used for Credit Scoring, focusing on one of the most used models for counterparty credit evaluation. The work is divided into three Chapters. The first chapter is divided into two parts: the first part is dedicated to the introduction of the main concepts of Machine Learning, while the second part analyzes the concept of credit risk. After an initial focus on the definition of credit risk and its management through the analysis of the components (expected and non-expected loss), the chapter's approach presents a clear correlation with the evolution of banking regulation in recent years (from Basel I to Basel III). The second chapter provides a general overview of credit scoring techniques. This is done through a more conceptual description in the first part and more analytical in the second, in order to show both the reasons and the advantages of the use of credit scoring, but above all its effect on credit performance. In fact, classification models such as Logistic Regression or the Naive Bayesian are presented, through the discussion of the various phases of model construction (sample selection, explanatory variables, model estimation and verification of predictive capacity). The third chapter delves into Credit Scoring through the presentation of a specific model and its practical application. The chapter has an analytical and purely applicative character; a sample of European companies classified through the MORE framework is chosen for the application of the previously analyzed models. The choice of SMEs as subjects in the sample is not accidental: above all in Italy, the close relationship between banks and enterprises in fact reflects some structural characteristics of our economy: above all, the fragmentation of the productive fabric into numerous SMEs which are unable to direct access to the capital market. The empirical analysis carried out through the use of the WEKA software and the help of the STATA software aims to practically analyze the relationship between the balance sheet and the macroeconomic indicators chosen as explanatory variables and the classification variable Y (MScore); moreover, the predictive capacity of the model and the significance of the individual regressors are evaluated.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/2453