Sintesi di dataset compatti in ambienti decentralizzati: un framework sicuro e privacy-preserving per la Federated Dataset Distillation

Ricerca avanzata

In the field of artificial intelligence, Federated Learning and Dataset Distillation are increasingly relevant topics of wide interest both in the academic landscape and in industry. This dissertation thus aims to propose a framework extending Dataset Distillation to a federated context. The work, however, adopts a profoundly different approach from those existing in the literature: indeed, the purpose of this novel framework is the synthesis of a small dataset obtained from data belonging to different entities in a federated environment, that can be used to easily train machine learning models without significantly impacting their performance compared to centralized aggregation of individual datasets and without requiring the direct sharing of potentially sensitive information. After describing the framework in detail, a security standpoint is adopted to analyze possible vulnerabilities that could undermine the confidentiality of the data used during the distillation when the federated aggregator is an honest-but-curious attacker. A possible solution to this problem is hence proposed, capable of protecting user privacy without compromising the success of the distillation. Finally, the significant results obtained experimentally in support of this proposal are reported, constituting a promising foundation for future research developments in this direction.

Nel campo dell'intelligenza artificiale, Federated Learning e Dataset Distillation sono argomenti sempre più attuali e di ampio interesse sia nel panorama accademico sia nell'industria. Questa tesi punta quindi a proporre un framework che estende la Dataset Distillation ad un contesto federato. Il lavoro adotta tuttavia un approccio profondamente diverso da quelli esistenti in letteratura: lo scopo di questo nuovo framework è infatti la sintesi di un dataset di dimensioni ridotte ricavato dai dati appartenenti a diverse entità in un ambiente federato, che possa essere usato per addestrare facilmente dei modelli di machine learning senza impattare sensibilmente sulle loro prestazioni rispetto all'aggregazione centralizzata dei singoli dataset e senza richiedere la condivisione diretta di informazioni potenzialmente sensibili. Dopo aver descritto in dettaglio il framework, ne vengono analizzate le possibili vulnerabilità sotto il profilo della sicurezza, che potrebbero minare la riservatezza dei dati usati durante la distillazione nel caso in cui l'aggregatore federato sia un attaccante passivo. Viene quindi proposta una possibile soluzione a questo problema, in grado di proteggere la privacy degli utenti senza però compromettere la riuscita della distillazione. Infine, vengono riportati i significativi risultati sperimentali a supporto di questa proposta, che costituiscono una promettente base per futuri sviluppi nella ricerca in questa direzione.

Sintesi di dataset compatti in ambienti decentralizzati: un framework sicuro e privacy-preserving per la Federated Dataset Distillation

MURER, DANIELE

2023/2024

Abstract

In the field of artificial intelligence, Federated Learning and Dataset Distillation are increasingly relevant topics of wide interest both in the academic landscape and in industry. This dissertation thus aims to propose a framework extending Dataset Distillation to a federated context. The work, however, adopts a profoundly different approach from those existing in the literature: indeed, the purpose of this novel framework is the synthesis of a small dataset obtained from data belonging to different entities in a federated environment, that can be used to easily train machine learning models without significantly impacting their performance compared to centralized aggregation of individual datasets and without requiring the direct sharing of potentially sensitive information. After describing the framework in detail, a security standpoint is adopted to analyze possible vulnerabilities that could undermine the confidentiality of the data used during the distillation when the federated aggregator is an honest-but-curious attacker. A possible solution to this problem is hence proposed, capable of protecting user privacy without compromising the success of the distillation. Finally, the significant results obtained experimentally in support of this proposal are reported, constituting a promising foundation for future research developments in this direction.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INDUSTRIALE E DELL'INFORMAZIONE
			
	Corso di studio
	
				COMPUTER ENGINEERING [06415]
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Condensing Data in Decentralized Environments: A Secure and Privacy-Preserving Framework for Federated Dataset Distillation
			
	Abstract in italiano
	
				Nel campo dell'intelligenza artificiale, Federated Learning e Dataset Distillation sono argomenti sempre più attuali e di ampio interesse sia nel panorama accademico sia nell'industria. Questa tesi punta quindi a proporre un framework che estende la Dataset Distillation ad un contesto federato. Il lavoro adotta tuttavia un approccio profondamente diverso da quelli esistenti in letteratura: lo scopo di questo nuovo framework è infatti la sintesi di un dataset di dimensioni ridotte ricavato dai dati appartenenti a diverse entità in un ambiente federato, che possa essere usato per addestrare facilmente dei modelli di machine learning senza impattare sensibilmente sulle loro prestazioni rispetto all'aggregazione centralizzata dei singoli dataset e senza richiedere la condivisione diretta di informazioni potenzialmente sensibili. Dopo aver descritto in dettaglio il framework, ne vengono analizzate le possibili vulnerabilità sotto il profilo della sicurezza, che potrebbero minare la riservatezza dei dati usati durante la distillazione nel caso in cui l'aggregatore federato sia un attaccante passivo. Viene quindi proposta una possibile soluzione a questo problema, in grado di proteggere la privacy degli utenti senza però compromettere la riuscita della distillazione. Infine, vengono riportati i significativi risultati sperimentali a supporto di questa proposta, che costituiscono una promettente base per futuri sviluppi nella ricerca in questa direzione.
			
	Relatore
	
				NOCERA, ANTONINO
			
	Correlatore
	
				ARAZZI, MARCO
CIHANGIROGLU, MERT
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Tesi - Daniele Murer.pdf non disponibili Descrizione: Tesi dal titolo "Sintesi di dataset compatti in ambienti decentralizzati: un framework sicuro e privacy-preserving per la Federated Dataset Distillation" Dimensione 5.21 MB Formato Adobe PDF Richiedi una copia	5.21 MB	Adobe PDF	Richiedi una copia

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/33379