This study investigates the application of advanced machine learning techniques to predict corporate bankruptcy among unlisted firms using financial data from 2014–2024 obtained from the ORBIS database. Unlisted companies constitute a critical yet under-researched segment of the economy due to limited disclosure requirements and the absence of standardized reporting frameworks. Early detection of financial distress in such firms is essential for credit institutions, investors, and policymakers seeking to mitigate systemic risk and strengthen financial stability. A curated dataset of private firms was developed through systematic data cleaning, trans- formation, and feature engineering of key financial ratios—including liquidity, cash ratio, debt ratio, working capital, EBITDA, and intangible ratio. Bankruptcy was defined via a balance- sheet criterion, classifying firms as bankrupt when total liabilities exceeded total assets. The dataset was divided into training and testing subsets, and six supervised learning models were implemented: logistic regression, support vector machine (SVM), random forest, XGBoost, LightGBM, and a feedforward neural network. Model performance was primarily evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC). The random forest achieved the highest individual AUC (≈ 0.952), while an ensemble of random forest, XGBoost, and LightGBM reached 0.955, demonstrating superior predictive accuracy. Beyond traditional AUC analysis, Rank Graduation metrics— Rank Graduation Accuracy (RGA), Rank Graduation Robustness (RGR), and Rank Graduation Explainability (RGE) were applied to assess model stability and interpretability. RGA validated the consistency of predictive ranking across models; RGR measured robustness to data pertur- bations, emphasizing the stability of ensemble and tree-based methods; and RGE quantified the explanatory contribution of financial ratios, revealing liquidity and leverage as the most influential features. Overall, the findings demonstrate that machine learning can provide a reliable, interpretable, and reproducible framework for assessing bankruptcy risk in private markets using only finan- cial statement data. The integration of RGA, RGR, and RGE enhances both analytical depth and trustworthiness, offering a novel and comprehensive approach to predictive financial risk modeling
Questo studio analizza l’applicazione di tecniche avanzate di machine learning per la previsione del fallimento aziendale tra imprese non quotate, utilizzando dati finanziari relativi al periodo 2014–2024 ottenuti dal database ORBIS. Le imprese non quotate rappresentano un segmento cruciale ma ancora poco esplorato dell’economia, a causa di obblighi di trasparenza più limitati e dell’assenza di quadri di rendicontazione standardizzati. L’individuazione precoce delle situazioni di difficoltà finanziaria in tali imprese è essenziale per istituzioni creditizie, investitori e decisori politici che mirano a mitigare il rischio sistemico e a rafforzare la stabilità finanziaria. È stato costruito un dataset selezionato di imprese private attraverso un processo sistematico di pulizia dei dati, trasformazione e feature engineering dei principali indicatori finanziari — tra cui liquidità, cash ratio, debt ratio, capitale circolante, EBITDA e intangible ratio. Il fallimento è stato definito secondo un criterio di bilancio, classificando un’impresa come fallita quando il totale delle passività superava il totale delle attività. Il dataset è stato suddiviso in sottoinsiemi di training e testing e sono stati implementati sei modelli di apprendimento supervisionato: regressione logistica, support vector machine (SVM), random forest, XGBoost, LightGBM e una rete neurale feedforward. Le performance dei modelli sono state valutate principalmente attraverso l’Area Under the Receiver Operating Characteristic Curve (AUC). Il modello random forest ha ottenuto il valore AUC individuale più elevato (≈ 0,952), mentre un ensemble composto da random forest, XGBoost e LightGBM ha raggiunto 0,955, dimostrando una superiore accuratezza predittiva. Oltre alla tradizionale analisi dell’AUC, sono state applicate metriche di Rank Graduation — Rank Graduation Accuracy (RGA), Rank Graduation Robustness (RGR) e Rank Graduation Explainability (RGE) — al fine di valutare la stabilità e l’interpretabilità dei modelli. La RGA ha validato la coerenza dell’ordinamento predittivo tra i modelli; la RGR ha misurato la robustezza rispetto a perturbazioni dei dati, evidenziando la stabilità dei metodi ensemble e basati su alberi; la RGE ha quantificato il contributo esplicativo degli indicatori finanziari, rivelando liquidità e leva finanziaria come le variabili più influenti. Nel complesso, i risultati dimostrano che il machine learning può offrire un quadro affidabile, interpretabile e replicabile per la valutazione del rischio di fallimento nei mercati privati utiliz- zando esclusivamente dati di bilancio. L’integrazione delle metriche RGA, RGR e RGE accresce sia la profondità analitica sia l’affidabilità del modello, proponendo un approccio innovativo e completo alla modellizzazione predittiva del rischio finanziario. Keywords: Bankruptcy prediction, Machine learning, Financial ratios, Random Forest, XG- Boost, LightGBM, Neural networks, Model robustness, Explainability, RGA, RGR, Unlisted firms, Financial distress.
Metodi di Intelligenza Artificiale per la Previsione del Fallimento Aziendale nelle Imprese Private
BAYAT PATAPEH, NASTARAN
2024/2025
Abstract
This study investigates the application of advanced machine learning techniques to predict corporate bankruptcy among unlisted firms using financial data from 2014–2024 obtained from the ORBIS database. Unlisted companies constitute a critical yet under-researched segment of the economy due to limited disclosure requirements and the absence of standardized reporting frameworks. Early detection of financial distress in such firms is essential for credit institutions, investors, and policymakers seeking to mitigate systemic risk and strengthen financial stability. A curated dataset of private firms was developed through systematic data cleaning, trans- formation, and feature engineering of key financial ratios—including liquidity, cash ratio, debt ratio, working capital, EBITDA, and intangible ratio. Bankruptcy was defined via a balance- sheet criterion, classifying firms as bankrupt when total liabilities exceeded total assets. The dataset was divided into training and testing subsets, and six supervised learning models were implemented: logistic regression, support vector machine (SVM), random forest, XGBoost, LightGBM, and a feedforward neural network. Model performance was primarily evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC). The random forest achieved the highest individual AUC (≈ 0.952), while an ensemble of random forest, XGBoost, and LightGBM reached 0.955, demonstrating superior predictive accuracy. Beyond traditional AUC analysis, Rank Graduation metrics— Rank Graduation Accuracy (RGA), Rank Graduation Robustness (RGR), and Rank Graduation Explainability (RGE) were applied to assess model stability and interpretability. RGA validated the consistency of predictive ranking across models; RGR measured robustness to data pertur- bations, emphasizing the stability of ensemble and tree-based methods; and RGE quantified the explanatory contribution of financial ratios, revealing liquidity and leverage as the most influential features. Overall, the findings demonstrate that machine learning can provide a reliable, interpretable, and reproducible framework for assessing bankruptcy risk in private markets using only finan- cial statement data. The integration of RGA, RGR, and RGE enhances both analytical depth and trustworthiness, offering a novel and comprehensive approach to predictive financial risk modeling| File | Dimensione | Formato | |
|---|---|---|---|
|
THESIS_1 (8).pdf
accesso aperto
Dimensione
2.15 MB
Formato
Adobe PDF
|
2.15 MB | Adobe PDF | Visualizza/Apri |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: [email protected].
https://hdl.handle.net/20.500.14239/34849