The proliferation of audio deepfake technology poses significant challenges to cybersecurity, privacy, and trust. Generated using advanced artificial intelligence techniques, audio deepfakes can imitate human speech with remarkable accuracy, leading to potential abuses such as identity theft, fraud, and disinformation. Detecting such synthetic audio has become critically important as these technologies continue to evolve. This thesis explores audio deepfake detection by proposing novel Machine Learning (ML) and Deep Learning (DL) approaches, with a primary focus on improving feature extraction and automating the feature mapping process. A comprehensive analysis of various audio features, such as Mel-spectrograms, MFCC (Mel-Frequency Cepstral Coefficients), and CQT (Constant-Q Transform), among others, is conducted, enhancing their effectiveness through the use of CNNs. Additionally, a fully end-to-end approach (DeepSpectraNetE2E) is proposed, allowing the model to autonomously learn time-frequency representations directly from raw audio, thus automating the entire feature extraction process. Alongside DeepSpectraNetE2E, three other models (DeepSpectraNet, DeepSpectraNetFlex, and DeepSpectraNetLite) are introduced, all of which surpass existing models in the literature. Experimental results demonstrate that these deep learning models significantly outperform traditional machine learning models in terms of accuracy and generalization. The results also highlight the effectiveness of combining multiple audio features and using CNN-based feature mapping strategies to enhance frequency-related information within the signal. This work contributes to audio deepfake detection by proposing four new models that improve accuracy and generalize better in the detection of sophisticated synthetic audio by providing novel mapping strategies along with automated feature extraction.

Automated Feature Mapping for Audio DeepFake Detection

ALBERTI, ANDREA
2023/2024

Abstract

The proliferation of audio deepfake technology poses significant challenges to cybersecurity, privacy, and trust. Generated using advanced artificial intelligence techniques, audio deepfakes can imitate human speech with remarkable accuracy, leading to potential abuses such as identity theft, fraud, and disinformation. Detecting such synthetic audio has become critically important as these technologies continue to evolve. This thesis explores audio deepfake detection by proposing novel Machine Learning (ML) and Deep Learning (DL) approaches, with a primary focus on improving feature extraction and automating the feature mapping process. A comprehensive analysis of various audio features, such as Mel-spectrograms, MFCC (Mel-Frequency Cepstral Coefficients), and CQT (Constant-Q Transform), among others, is conducted, enhancing their effectiveness through the use of CNNs. Additionally, a fully end-to-end approach (DeepSpectraNetE2E) is proposed, allowing the model to autonomously learn time-frequency representations directly from raw audio, thus automating the entire feature extraction process. Alongside DeepSpectraNetE2E, three other models (DeepSpectraNet, DeepSpectraNetFlex, and DeepSpectraNetLite) are introduced, all of which surpass existing models in the literature. Experimental results demonstrate that these deep learning models significantly outperform traditional machine learning models in terms of accuracy and generalization. The results also highlight the effectiveness of combining multiple audio features and using CNN-based feature mapping strategies to enhance frequency-related information within the signal. This work contributes to audio deepfake detection by proposing four new models that improve accuracy and generalize better in the detection of sophisticated synthetic audio by providing novel mapping strategies along with automated feature extraction.
2023
Automated Feature Mapping for Audio DeepFake Detection
File in questo prodotto:
File Dimensione Formato  
Andrea_Alberti_thesis.pdf

accesso aperto

Dimensione 22.88 MB
Formato Adobe PDF
22.88 MB Adobe PDF Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/33293