This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.

This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.

Previsione del Prezzo delle Criptovalute Basata sul Deep Learning: Uno Studio Completo sull'Integrazione di Serie Temporali Finanziarie e Sentiment dei Social Media

MERCAN, ENES CAN
2023/2024

Abstract

This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.
2023
Deep Learning-Based Cryptocurrency Price Forecasting: A Comprehensive Study of Financial Time Series and Social Media Sentiment Integration
This thesis extensively investigates the integration of deep learning models and social media sentiment analysis to improve price prediction performance in cryptocurrency markets. The cryptocurrency market presents a challenging environment for traditional financial forecasting methods due to its high volatility and sensitivity to investor sentiment. Therefore, integrating social media data reflecting investor behavior and market psychology into price forecasting models plays a critical role in improving forecasting performance. Six leading cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Solana (SOL), Polygon (MATIC), Polkadot (DOT), and Cosmos (ATOM), are selected for the study based on their market representation, liquidity profiles, and data completeness criteria. Covering the period from May 2021 to June 2023, minute price data was rigorously filtered for data quality and consistency, resampled on an hourly basis, and enriched with various technical indicators (e.g., EMA, RSI, MACD, ATR, Momentum, Bollinger Bands, Return and Log Return). For the same period, 6.8 million social media posts from Twitter and Reddit were processed with CryptoBERT-based sentiment analysis, and numerical sentiment metrics were generated on an hourly basis. In the modeling phase, three different deep learning architectures, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), were trained both using price and technical indicators, which are just market data only, and then integrating social media sentiment variables. The performance of the models is evaluated with statistical metrics such as MAPE, MAE, MSE, RMSE, and R², and the contribution of different architectures and data sets to the prediction accuracy is analyzed. According to the results, the GRU architecture, which integrates social media sentiment, stands out as the most successful model with an average MAPE of 0.76%. Especially for cryptocurrencies such as Polkadot, which has a medium-sized market capitalization, social media integration provided a significant performance improvement. On the other hand, while the BiLSTM architecture produced successful results with price data, the addition of social media data negatively affected performance. This thesis contributes to the literature by demonstrating that the effectiveness of social media sentiment integration varies significantly with model architecture and cryptocurrency characteristics, providing a methodological framework for selecting the optimal model-feature combination for different digital assets. The findings provide valuable guidance for investors, analysts, and algorithmic trading systems while also highlighting the importance of architecture-specific responses to heterogeneous data sources in financial forecasting. Future research could explore additional model architectures, longer time horizons, and the integration of on-chain metrics to further enhance prediction performance.
File in questo prodotto:
File Dimensione Formato  
Enes_Can_MERCAN_Final_Master_Thesis.pdf

accesso aperto

Descrizione: Previsione del Prezzo delle Criptovalute Basata sul Deep Learning: Uno Studio Completo sull'Integrazione di Serie Temporali Finanziarie e Sentiment dei Social Media
Dimensione 5.84 MB
Formato Adobe PDF
5.84 MB Adobe PDF Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/33378