Nowadays, a huge amount of data, both structured and unstructured, is available. Unstructured data does not have a predefined structure or representation and are the most frequent type of data that an organization has. Electronic Credit Line Procedures (also kown as PEF – Pratiche Elettroniche di Fido) are a good example of this type of data. The purpose of the analysis, performed during the internship in a bank, is to extract valuable and useful information from these textual documents through SAS Contextual Analysis. Initially, the literature was consulted for a general overview of text mining and its methods and applications in the financial sector. Subsequently, after an explanation of how the SAS software works, the analysis is described starting from the data preparation to the creation of concept and category rules aimed at extracting piece of information from the text and classifying the documents on the basis of the extracted information. During the analysis it was necessary to face the typical drawbacks of text mining such as word sense disambiguation, misspellings, etc. Lastly, the categories created and the terms used to extract information are displayed with eye-catching views using SAS Visual Analytics. Once the analysis is completed, future work concerns how to use the output of the model in order to make decisions and plan the commercial strategy.
Oggigiorno, è disponibile un'enorme quantità di dati, sia strutturati che non strutturati. I dati non strutturati non hanno una struttura o rappresentazione predefinita e sono il tipo di dati più frequente che un’azienda possiede. I documenti PEF rappresentano un ottimo esempio di questo tipo di dati. Infatti, lo scopo dell’analisi, effettuata durante il periodo di stage presso una conosciuta banca italiana, è l’estrazione, tramite SAS Contextual Analysis, di preziose e utili informazioni da questi documenti. Inizialmente, la letteratura è stata consultata per avere una panoramica del text mining, delle sue tecniche e applicazioni nel settore finanziario. Successivamente, dopo una spiegazione del funzionamento del software, l’analisi viene spiegata partendo dalla preparazione dei dati fino alla creazione delle regole concettuali e di categoria mirate ad estrarre informazioni e a classificare i documenti in base a quest’ultime. Alcune problematiche legate al text mining sono emerse e affrontate durante l’analisi, come ad esempio, parole ambigue o errori di ortografia. Infine, le categorie e i termini usati per estrarre le informazioni sono mostrati con dashboards, create usando SAS Visual Analytics, per una migliore rappresentazione. Una volta terminata l’analisi, l’obiettivo futuro è capire come usare i risultati del modello per prendere decisioni e pianificare la strategia commerciale.
The utility of text mining: the analysis of PEF documents using SAS Contextual
MURARA, SAMUEL
2018/2019
Abstract
Nowadays, a huge amount of data, both structured and unstructured, is available. Unstructured data does not have a predefined structure or representation and are the most frequent type of data that an organization has. Electronic Credit Line Procedures (also kown as PEF – Pratiche Elettroniche di Fido) are a good example of this type of data. The purpose of the analysis, performed during the internship in a bank, is to extract valuable and useful information from these textual documents through SAS Contextual Analysis. Initially, the literature was consulted for a general overview of text mining and its methods and applications in the financial sector. Subsequently, after an explanation of how the SAS software works, the analysis is described starting from the data preparation to the creation of concept and category rules aimed at extracting piece of information from the text and classifying the documents on the basis of the extracted information. During the analysis it was necessary to face the typical drawbacks of text mining such as word sense disambiguation, misspellings, etc. Lastly, the categories created and the terms used to extract information are displayed with eye-catching views using SAS Visual Analytics. Once the analysis is completed, future work concerns how to use the output of the model in order to make decisions and plan the commercial strategy.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/6557