The purpose of this thesis is tuning and training a Recurrent Neural Network to learn hidden text structures with an Unsupervised Learning algorithm capable of generating original sentences from a dataset closed to a specific topic. The algorithm is tuned to understand similarities in existing sentences and to reproduce coherent patterns of words related to a specific topic. The problem that this work tries to solve is to teach to a computer to write original content about a specific topic, creating sentences that can be used with real ones written by humans, without any relevant difference. This result is achieved by training a Long Short-Term Memory network with a topic-related dataset acquired from Internet forums. In particular, this training is performed using multiple restaurants reviews taken from the website Tripadvisor, which provides a considerable volume of sentences, tagged by the review score, that can drive valuable text features about which words are used to write good or bad reviews, syntax structure of a review and the topic of the sentences (the kind of restaurant customers are talking about). The first part of this work is about creating a fairly large dataset collecting data from Tripadvisor and parsing them accordingly with the input needed by the LSTM network. The next step is to test different hyperparameters trying to achieve the best result in terms of loss and accuracy between the splitted dataset (training and testing sentences). Lastly, a script uses the trained network to generate new content using custom seeds given by the user. The generated results can be sentences of different length that start with the given seed; the best result is given by the semantic combinations of words (the sentence contains words in the same semantic field of the input) while regarding the grammatical construction of the sentence, the revision of a human figure is still necessary.

Reti neurali ricorrenti LSTM per la generazione di testo: tuning e training di un algoritmo in grado di creare frasi 'naturali'

LSTM Recurrent Neural Network for text generation. Tuning and training of an algorithm capable of generating human-like sentences

PELAGATTI, SIMONE
2016/2017

Abstract

The purpose of this thesis is tuning and training a Recurrent Neural Network to learn hidden text structures with an Unsupervised Learning algorithm capable of generating original sentences from a dataset closed to a specific topic. The algorithm is tuned to understand similarities in existing sentences and to reproduce coherent patterns of words related to a specific topic. The problem that this work tries to solve is to teach to a computer to write original content about a specific topic, creating sentences that can be used with real ones written by humans, without any relevant difference. This result is achieved by training a Long Short-Term Memory network with a topic-related dataset acquired from Internet forums. In particular, this training is performed using multiple restaurants reviews taken from the website Tripadvisor, which provides a considerable volume of sentences, tagged by the review score, that can drive valuable text features about which words are used to write good or bad reviews, syntax structure of a review and the topic of the sentences (the kind of restaurant customers are talking about). The first part of this work is about creating a fairly large dataset collecting data from Tripadvisor and parsing them accordingly with the input needed by the LSTM network. The next step is to test different hyperparameters trying to achieve the best result in terms of loss and accuracy between the splitted dataset (training and testing sentences). Lastly, a script uses the trained network to generate new content using custom seeds given by the user. The generated results can be sentences of different length that start with the given seed; the best result is given by the semantic combinations of words (the sentence contains words in the same semantic field of the input) while regarding the grammatical construction of the sentence, the revision of a human figure is still necessary.
2016
LSTM Recurrent Neural Network for text generation. Tuning and training of an algorithm capable of generating human-like sentences
Reti neurali ricorrenti LSTM per la generazione di testo: tuning e training di un algoritmo in grado di creare frasi 'naturali'
File in questo prodotto:
Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/25156