Automated Feature Mapping for Audio DeepFake Detection

The proliferation of audio deepfake technology poses significant challenges to cybersecurity, privacy, and trust. Generated using advanced artificial intelligence techniques, audio deepfakes can imitate human speech with remarkable accuracy, leading to potential abuses such as identity theft, fraud, and disinformation. Detecting such synthetic audio has become critically important as these technologies continue to evolve. This thesis explores audio deepfake detection by proposing novel Machine Learning (ML) and Deep Learning (DL) approaches, with a primary focus on improving feature extraction and automating the feature mapping process. A comprehensive analysis of various audio features, such as Mel-spectrograms, MFCC (Mel-Frequency Cepstral Coefficients), and CQT (Constant-Q Transform), among others, is conducted, enhancing their effectiveness through the use of CNNs. Additionally, a fully end-to-end approach (DeepSpectraNetE2E) is proposed, allowing the model to autonomously learn time-frequency representations directly from raw audio, thus automating the entire feature extraction process. Alongside DeepSpectraNetE2E, three other models (DeepSpectraNet, DeepSpectraNetFlex, and DeepSpectraNetLite) are introduced, all of which surpass existing models in the literature. Experimental results demonstrate that these deep learning models significantly outperform traditional machine learning models in terms of accuracy and generalization. The results also highlight the effectiveness of combining multiple audio features and using CNN-based feature mapping strategies to enhance frequency-related information within the signal. This work contributes to audio deepfake detection by proposing four new models that improve accuracy and generalize better in the detection of sophisticated synthetic audio by providing novel mapping strategies along with automated feature extraction.

Automated Feature Mapping for Audio DeepFake Detection

ALBERTI, ANDREA

2023/2024

Abstract

The proliferation of audio deepfake technology poses significant challenges to cybersecurity, privacy, and trust. Generated using advanced artificial intelligence techniques, audio deepfakes can imitate human speech with remarkable accuracy, leading to potential abuses such as identity theft, fraud, and disinformation. Detecting such synthetic audio has become critically important as these technologies continue to evolve. This thesis explores audio deepfake detection by proposing novel Machine Learning (ML) and Deep Learning (DL) approaches, with a primary focus on improving feature extraction and automating the feature mapping process. A comprehensive analysis of various audio features, such as Mel-spectrograms, MFCC (Mel-Frequency Cepstral Coefficients), and CQT (Constant-Q Transform), among others, is conducted, enhancing their effectiveness through the use of CNNs. Additionally, a fully end-to-end approach (DeepSpectraNetE2E) is proposed, allowing the model to autonomously learn time-frequency representations directly from raw audio, thus automating the entire feature extraction process. Alongside DeepSpectraNetE2E, three other models (DeepSpectraNet, DeepSpectraNetFlex, and DeepSpectraNetLite) are introduced, all of which surpass existing models in the literature. Experimental results demonstrate that these deep learning models significantly outperform traditional machine learning models in terms of accuracy and generalization. The results also highlight the effectiveness of combining multiple audio features and using CNN-based feature mapping strategies to enhance frequency-related information within the signal. This work contributes to audio deepfake detection by proposing four new models that improve accuracy and generalize better in the detection of sophisticated synthetic audio by providing novel mapping strategies along with automated feature extraction.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INDUSTRIALE E DELL'INFORMAZIONE
			
	Corso di studio
	
				COMPUTER ENGINEERING [06415]
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Automated Feature Mapping for Audio DeepFake Detection
			
	Relatore
	
				CUSANO, CLAUDIO
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Andrea_Alberti_thesis.pdf accesso aperto Dimensione 22.88 MB Formato Adobe PDF Visualizza/Apri	22.88 MB	Adobe PDF	Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/33293