LSTM Recurrent Neural Network for text generation. 
Tuning and training of an algorithm capable of generating human-like sentences

The purpose of this thesis is tuning and training a Recurrent Neural Network to learn hidden text structures with an Unsupervised Learning algorithm capable of generating original sentences from a dataset closed to a specific topic. The algorithm is tuned to understand similarities in existing sentences and to reproduce coherent patterns of words related to a specific topic. The problem that this work tries to solve is to teach to a computer to write original content about a specific topic, creating sentences that can be used with real ones written by humans, without any relevant difference. This result is achieved by training a Long Short-Term Memory network with a topic-related dataset acquired from Internet forums. In particular, this training is performed using multiple restaurants reviews taken from the website Tripadvisor, which provides a considerable volume of sentences, tagged by the review score, that can drive valuable text features about which words are used to write good or bad reviews, syntax structure of a review and the topic of the sentences (the kind of restaurant customers are talking about). The first part of this work is about creating a fairly large dataset collecting data from Tripadvisor and parsing them accordingly with the input needed by the LSTM network. The next step is to test different hyperparameters trying to achieve the best result in terms of loss and accuracy between the splitted dataset (training and testing sentences). Lastly, a script uses the trained network to generate new content using custom seeds given by the user. The generated results can be sentences of different length that start with the given seed; the best result is given by the semantic combinations of words (the sentence contains words in the same semantic field of the input) while regarding the grammatical construction of the sentence, the revision of a human figure is still necessary.

Reti neurali ricorrenti LSTM per la generazione di testo: tuning e training di un algoritmo in grado di creare frasi 'naturali'

LSTM Recurrent Neural Network for text generation. Tuning and training of an algorithm capable of generating human-like sentences

PELAGATTI, SIMONE

2016/2017

Abstract

The purpose of this thesis is tuning and training a Recurrent Neural Network to learn hidden text structures with an Unsupervised Learning algorithm capable of generating original sentences from a dataset closed to a specific topic. The algorithm is tuned to understand similarities in existing sentences and to reproduce coherent patterns of words related to a specific topic. The problem that this work tries to solve is to teach to a computer to write original content about a specific topic, creating sentences that can be used with real ones written by humans, without any relevant difference. This result is achieved by training a Long Short-Term Memory network with a topic-related dataset acquired from Internet forums. In particular, this training is performed using multiple restaurants reviews taken from the website Tripadvisor, which provides a considerable volume of sentences, tagged by the review score, that can drive valuable text features about which words are used to write good or bad reviews, syntax structure of a review and the topic of the sentences (the kind of restaurant customers are talking about). The first part of this work is about creating a fairly large dataset collecting data from Tripadvisor and parsing them accordingly with the input needed by the LSTM network. The next step is to test different hyperparameters trying to achieve the best result in terms of loss and accuracy between the splitted dataset (training and testing sentences). Lastly, a script uses the trained network to generate new content using custom seeds given by the user. The generated results can be sentences of different length that start with the given seed; the best result is given by the semantic combinations of words (the sentence contains words in the same semantic field of the input) while regarding the grammatical construction of the sentence, the revision of a human figure is still necessary.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INDUSTRIALE E DELL'INFORMAZIONE
			
	Corso di studio
	
				COMPUTER ENGINEERING [06415]
			
	Anno Accademico
	
				2016
			
	Titolo inglese
	
				LSTM Recurrent Neural Network for text generation. 
Tuning and training of an algorithm capable of generating human-like sentences
			
	Abstract in italiano
	
				Reti neurali ricorrenti LSTM per la generazione di testo: tuning e training di un algoritmo in grado di creare frasi 'naturali'
			
	Relatore
	
				PORTA, MARCO
			
	Correlatore
	
				MEOLA, PAOLO
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/25156

LSTM Recurrent Neural Network for text generation. Tuning and training of an algorithm capable of generating human-like sentences

PELAGATTI, SIMONE

2016/2017

Abstract

Scheda Scheda DC

Informazioni

Conferma cancellazione

Scheda

Scheda DC