This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.
This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.
Previsione del Prezzo delle Criptovalute Basata sul Deep Learning: Uno Studio Completo sull'Integrazione di Serie Temporali Finanziarie e Sentiment dei Social Media
MERCAN, ENES CAN
2023/2024
Abstract
This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.| File | Dimensione | Formato | |
|---|---|---|---|
|
Enes_Can_MERCAN_Final_Master_Thesis.pdf
accesso aperto
Descrizione: Previsione del Prezzo delle Criptovalute Basata sul Deep Learning: Uno Studio Completo sull'Integrazione di Serie Temporali Finanziarie e Sentiment dei Social Media
Dimensione
5.84 MB
Formato
Adobe PDF
|
5.84 MB | Adobe PDF | Visualizza/Apri |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/33378