SENTIMENT ANALYSIS AND CLASSIFICATION ON SOCIAL DATA

With the growth of the Internet, social platforms like Twitter and microblogs have emerged. Since people express their ideas on these platforms, it is possible to analyze people’s emotional tendencies through tweets. But, given the huge volumes implied in such analyses, computerized classification is imperative. This thesis is on judgment about sentiment polarity. Firstly, this thesis summarizes the current situation and technologies for sentiment analysis. Secondly, it proposes a methodology for data processing, feature extraction and algorithms. Finally, it evaluates the effect of various features and algorithms to identify the best method for sentiment analysis. Let us summarize the key points of the thesis: (1) A variety of processing methods. We get data from tweets and save them in a predefined format. Also, we filter data, segment words and part of speech, and so on. (2) A variety of features. 16 sentiment features are considered up in the platform, including punctuation, parts of speech, and other attributes. Combined with existing theories and techniques, N-gram calculation and dependency relation method are used. Thus, we obtain a comprehensive analysis through the combination of features. In order to shorten the time for classification, features are sorted according to their respective importance, and features with higher contribution are retained. (3) A hybrid approach, which combines machine learning and lexicon to improve the performance of sentiment classification. Machine learning and Lexicon are used in the algorithm module. The Lexicon-based approach relies on the emotional dictionary, which contains words with known sentiment scores, and it calculates the sentence score from word score. (4) The combination test. It is difficult for users to make the right choice given the variety of methods for data preprocessing, feature extraction, and algorithm selection. Since it is time-consuming to try one by one, we calculate the results of methods in a multi-threaded way. The program selects the best way to deal with the data, and returns the results to users. Key words: Sentiment Analysis, Machine Learning, Semi-supervised Learning Algorithm, Text Classification, Emotional Dictionary

SENTIMENT ANALYSIS AND CLASSIFICATION ON SOCIAL DATA

LI, TIANQI

2016/2017

Abstract

With the growth of the Internet, social platforms like Twitter and microblogs have emerged. Since people express their ideas on these platforms, it is possible to analyze people’s emotional tendencies through tweets. But, given the huge volumes implied in such analyses, computerized classification is imperative. This thesis is on judgment about sentiment polarity. Firstly, this thesis summarizes the current situation and technologies for sentiment analysis. Secondly, it proposes a methodology for data processing, feature extraction and algorithms. Finally, it evaluates the effect of various features and algorithms to identify the best method for sentiment analysis. Let us summarize the key points of the thesis: (1) A variety of processing methods. We get data from tweets and save them in a predefined format. Also, we filter data, segment words and part of speech, and so on. (2) A variety of features. 16 sentiment features are considered up in the platform, including punctuation, parts of speech, and other attributes. Combined with existing theories and techniques, N-gram calculation and dependency relation method are used. Thus, we obtain a comprehensive analysis through the combination of features. In order to shorten the time for classification, features are sorted according to their respective importance, and features with higher contribution are retained. (3) A hybrid approach, which combines machine learning and lexicon to improve the performance of sentiment classification. Machine learning and Lexicon are used in the algorithm module. The Lexicon-based approach relies on the emotional dictionary, which contains words with known sentiment scores, and it calculates the sentence score from word score. (4) The combination test. It is difficult for users to make the right choice given the variety of methods for data preprocessing, feature extraction, and algorithm selection. Since it is time-consuming to try one by one, we calculate the results of methods in a multi-threaded way. The program selects the best way to deal with the data, and returns the results to users. Key words: Sentiment Analysis, Machine Learning, Semi-supervised Learning Algorithm, Text Classification, Emotional Dictionary

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INDUSTRIALE E DELL'INFORMAZIONE
			
	Corso di studio
	
				COMPUTER ENGINEERING [06415]
			
	Anno Accademico
	
				2016
			
	Titolo inglese
	
				SENTIMENT ANALYSIS AND CLASSIFICATION ON SOCIAL DATA
			
	Abstract in italiano
	
				With the growth of the Internet, social platforms like Twitter and microblogs have emerged. Since people express their ideas on these platforms, it is possible to analyze people’s emotional tendencies through tweets. But, given the huge volumes implied in such analyses, computerized classification is imperative.
This thesis is on judgment about sentiment polarity. Firstly, this thesis summarizes the current situation and technologies for sentiment analysis. Secondly, it proposes a methodology for data processing, feature extraction and algorithms. Finally, it evaluates the effect of various features and algorithms to identify the best method for sentiment analysis. Let us summarize the key points of the thesis:
(1)	A variety of processing methods. We get data from tweets and save them in a predefined format. Also, we filter data, segment words and part of speech, and so on.
(2)	A variety of features. 16 sentiment features are considered up in the platform, including punctuation, parts of speech, and other attributes. Combined with existing theories and techniques, N-gram calculation and dependency relation method are used. Thus, we obtain a comprehensive analysis through the combination of features. In order to shorten the time for classification, features are sorted according to their respective importance, and features with higher contribution are retained. 
(3)	A hybrid approach, which combines machine learning and lexicon to improve the performance of sentiment classification. Machine learning and Lexicon are used in the algorithm module. The Lexicon-based approach relies on the emotional dictionary, which contains words with known sentiment scores, and it calculates the sentence score from word score. 
(4)	The combination test. It is difficult for users to make the right choice given the variety of methods for data preprocessing, feature extraction, and algorithm selection. Since it is time-consuming to try one by one, we calculate the results of methods in a multi-threaded way. The program selects the best way to deal with the data, and returns the results to users.
Key words: Sentiment Analysis, Machine Learning, Semi-supervised Learning Algorithm, Text Classification, Emotional Dictionary
			
	Relatore
	
				MOTTA, GIANMARIO PIERO ANTONIO
			
	Correlatore
	
				MA, TIANYI
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: [email protected].

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/21431