The aim of this work is to present a way to perform a market analysis, more precisely a customer segmentation, by applying and comparing the results of two unsupervised clustering techniques for topic modeling on the text scraped from the customers’ websites. While big data technologies are widely adopted in certain industries, first and foremost in tech, the literature reveals that managers are still far from embracing them to transition their companies into becoming data-driven organizations. According to existing literature, a wider adoption of big data faces four main challenge: the lack of skills to manage and analyze big data; unsuitable company cultures; difficulties in adapting decision making to make full use of the insights that it provides; the extensive adjustments that the organizational structure and existing processes would need to go though. Therefore, this work aims at demonstrating how big data technologies can be integrated within existing tasks (such as market analysis and customer segmentation) to possibly develop the skills, the company culture and the managerial abilities required for a wider adoption. The customer segmentation has been performed on a set of companies that manufacture lifting and handling equipment (NACE code 2822) on the basis of the line of product in which they specialize. The code has been written in Python programming language. A large portion of the code is dedicated to data collection and data cleaning. The remaining part of the code is dedicated to clustering the data and exploring the results. The topic modeling is performed by using the implementation of the Latent Dirichlet Allocation (LDA) model of the Gensim library and by using the Top2Vec (T2V) algorithm developed by Dimo Angelov. By using two topic modeling techniques based on different approaches, it is possible to compare the insights that each one provides. Both show promising results, with LDA being the more interpretable one and T2V being a great way to discover previously unknown niches. The results of this work prove that it is possible to perform a customer segmentation analysis with the aid of text analytics techniques, possibly gaining additional insights and speed in business as usual activities of a corporation.
Lo scopo di questo lavoro è di presenta un'analisi di mercato e più in particolare una segmentazione dei clienti attraverso l'uso di tecniche per unsupervised clustering. Lo sviluppo di tecnologie Big Data ha visto alcune multinazionali capaci di coglierne appieno i benefici. La letteratura però evidenzia che molti manager hanno ancora dubbi sull'implicazioni dell'utilizzo di tecnologie big data nei processi aziendali. Lo scopo di questa tesi è dimostrare come tecnologie big data possano essere inserite all'interno di comuni processi aziendali, senza richiedere una riorganizzazione dell'intera organizzazione e permettendo la sperimentazione di nuove tecnologie e processi decisionali. Per fare ciò è stato scritto un codice Python che ha raccolto i dati di aziende operanti nell'industria identificata da codice NACE 2822 dai loro siti web ed ha permesso di segmentarle sfruttando due tecniche di unsupervised clustering: Latent Dirichlet Allocation model e Top2Vec. Il risultato finale dimostra come sia possibile ottenere benefici dall'implementezione di tecnologie big data senza dover stravolgere i processi aziendali.
Data-driven Decision Making: un modello per l'utilizzo di text analytics nell'analisi del mercato B2B
PERCIVALLE, STEFANO
2021/2022
Abstract
The aim of this work is to present a way to perform a market analysis, more precisely a customer segmentation, by applying and comparing the results of two unsupervised clustering techniques for topic modeling on the text scraped from the customers’ websites. While big data technologies are widely adopted in certain industries, first and foremost in tech, the literature reveals that managers are still far from embracing them to transition their companies into becoming data-driven organizations. According to existing literature, a wider adoption of big data faces four main challenge: the lack of skills to manage and analyze big data; unsuitable company cultures; difficulties in adapting decision making to make full use of the insights that it provides; the extensive adjustments that the organizational structure and existing processes would need to go though. Therefore, this work aims at demonstrating how big data technologies can be integrated within existing tasks (such as market analysis and customer segmentation) to possibly develop the skills, the company culture and the managerial abilities required for a wider adoption. The customer segmentation has been performed on a set of companies that manufacture lifting and handling equipment (NACE code 2822) on the basis of the line of product in which they specialize. The code has been written in Python programming language. A large portion of the code is dedicated to data collection and data cleaning. The remaining part of the code is dedicated to clustering the data and exploring the results. The topic modeling is performed by using the implementation of the Latent Dirichlet Allocation (LDA) model of the Gensim library and by using the Top2Vec (T2V) algorithm developed by Dimo Angelov. By using two topic modeling techniques based on different approaches, it is possible to compare the insights that each one provides. Both show promising results, with LDA being the more interpretable one and T2V being a great way to discover previously unknown niches. The results of this work prove that it is possible to perform a customer segmentation analysis with the aid of text analytics techniques, possibly gaining additional insights and speed in business as usual activities of a corporation.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/2351