Saliency detection is one of the main active topics in the computer vision field. Asthe name suggests, saliency detection methods aim to emulate the human visualsystem in localizing the parts or objects in an image or video that grabs our at-tention. Inspired by neurocognitive and psychologist theories, many computationalmodels have been proposed since the late 1990s. In the last few years, with thespread of Deep Learning and in particular Convolutional Neural Networks, newdata-driven architectures have been introduced, which outperform all existing oneson public datasets, since they combine both “bottom-up” feature-based informationwith “top-down” high level semantic cues and can learn from human ground-truth.The purpose of this thesis is to bring together concepts and ideas from differentresearch areas, focusing on learning models along with the most popular datasetsand benchmarks. Furthermore, an analysis of some of these models over a set offace images is provided, which is a well-known high-level feature. The results aredirectly compared with the relative ground-truth data obtained from eye-trackingexperiments using different evaluation metrics, highlighting the differences betweenearly handcrafted models and those based on machine learning techniques.
Modelli di Salienza Visiva: un Confronto Basato su dati Eye-Tracking
Visual Saliency Models: A Comparison Based on Eye-Tracking Ground-Truth Data
SEMPERBONI, MICHELE
2020/2021
Abstract
Saliency detection is one of the main active topics in the computer vision field. Asthe name suggests, saliency detection methods aim to emulate the human visualsystem in localizing the parts or objects in an image or video that grabs our at-tention. Inspired by neurocognitive and psychologist theories, many computationalmodels have been proposed since the late 1990s. In the last few years, with thespread of Deep Learning and in particular Convolutional Neural Networks, newdata-driven architectures have been introduced, which outperform all existing oneson public datasets, since they combine both “bottom-up” feature-based informationwith “top-down” high level semantic cues and can learn from human ground-truth.The purpose of this thesis is to bring together concepts and ideas from differentresearch areas, focusing on learning models along with the most popular datasetsand benchmarks. Furthermore, an analysis of some of these models over a set offace images is provided, which is a well-known high-level feature. The results aredirectly compared with the relative ground-truth data obtained from eye-trackingexperiments using different evaluation metrics, highlighting the differences betweenearly handcrafted models and those based on machine learning techniques.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/13800