This thesis delves into the domain of anomaly detection within time series data, utilizing a combination of statistical and machine learning techniques, forming the foundation for training models aimed at predicting a signal’s behavior based on historical patterns. To enhance model robustness, synthetic anomalies are introduced. This dynamic strategy expands the training set, enabling the model to better identify and adapt to a broader spectrum of potential issues, particularly in cases where real-world anomalies are scarce. The anomaly detection process is strengthened through the integration of statistical methods, such as Standard Deviation and Interquartile range analysis. These detected anomalies serve as inputs for machine learning models, further refining the anomaly detection process and enhancing the versatility and accuracy of the models for being applied to the actual dataset. The proposed method relies on segmentation and adjustable parameters for anomaly tests. Its effectiveness is rigorously evaluated using metrics such as Precision, Recall, F1-Score, and ROC AUC Score. Additionally, an in-depth analysis of the method’s performance is conducted, considering the prevalence of anomalies and variations in specific model parameters. The methods are applied to a validation dataset with signals derived from HVAC systems. In unsupervised ML, both Standard deviation and Interquartile range methods proved effective for simple signals, but their performance decreased for more complex ones. On the other hand, supervised ML, employing a Support vector machine and Random forest, consistently demonstrated strong performance across both simple and complex signal scenarios. Notably, in specific instances, these metrics achieve a perfect score of 100%.

This thesis delves into the domain of anomaly detection within time series data, utilizing a combination of statistical and machine learning techniques, forming the foundation for training models aimed at predicting a signal’s behavior based on historical patterns. To enhance model robustness, synthetic anomalies are introduced. This dynamic strategy expands the training set, enabling the model to better identify and adapt to a broader spectrum of potential issues, particularly in cases where real-world anomalies are scarce. The anomaly detection process is strengthened through the integration of statistical methods, such as Standard Deviation and Interquartile range analysis. These detected anomalies serve as inputs for machine learning models, further refining the anomaly detection process and enhancing the versatility and accuracy of the models for being applied to the actual dataset. The proposed method relies on segmentation and adjustable parameters for anomaly tests. Its effectiveness is rigorously evaluated using metrics such as Precision, Recall, F1-Score, and ROC AUC Score. Additionally, an in-depth analysis of the method’s performance is conducted, considering the prevalence of anomalies and variations in specific model parameters. The methods are applied to a validation dataset with signals derived from HVAC systems. In unsupervised ML, both Standard deviation and Interquartile range methods proved effective for simple signals, but their performance decreased for more complex ones. On the other hand, supervised ML, employing a Support vector machine and Random forest, consistently demonstrated strong performance across both simple and complex signal scenarios. Notably, in specific instances, these metrics achieve a perfect score of 100%.

Anomaly Detection with Supervised/Unsupervised Machine Learning

SUNDARAVEL, MAGESH
2022/2023

Abstract

This thesis delves into the domain of anomaly detection within time series data, utilizing a combination of statistical and machine learning techniques, forming the foundation for training models aimed at predicting a signal’s behavior based on historical patterns. To enhance model robustness, synthetic anomalies are introduced. This dynamic strategy expands the training set, enabling the model to better identify and adapt to a broader spectrum of potential issues, particularly in cases where real-world anomalies are scarce. The anomaly detection process is strengthened through the integration of statistical methods, such as Standard Deviation and Interquartile range analysis. These detected anomalies serve as inputs for machine learning models, further refining the anomaly detection process and enhancing the versatility and accuracy of the models for being applied to the actual dataset. The proposed method relies on segmentation and adjustable parameters for anomaly tests. Its effectiveness is rigorously evaluated using metrics such as Precision, Recall, F1-Score, and ROC AUC Score. Additionally, an in-depth analysis of the method’s performance is conducted, considering the prevalence of anomalies and variations in specific model parameters. The methods are applied to a validation dataset with signals derived from HVAC systems. In unsupervised ML, both Standard deviation and Interquartile range methods proved effective for simple signals, but their performance decreased for more complex ones. On the other hand, supervised ML, employing a Support vector machine and Random forest, consistently demonstrated strong performance across both simple and complex signal scenarios. Notably, in specific instances, these metrics achieve a perfect score of 100%.
2022
Anomaly Detection with Supervised/Unsupervised Machine Learning
This thesis delves into the domain of anomaly detection within time series data, utilizing a combination of statistical and machine learning techniques, forming the foundation for training models aimed at predicting a signal’s behavior based on historical patterns. To enhance model robustness, synthetic anomalies are introduced. This dynamic strategy expands the training set, enabling the model to better identify and adapt to a broader spectrum of potential issues, particularly in cases where real-world anomalies are scarce. The anomaly detection process is strengthened through the integration of statistical methods, such as Standard Deviation and Interquartile range analysis. These detected anomalies serve as inputs for machine learning models, further refining the anomaly detection process and enhancing the versatility and accuracy of the models for being applied to the actual dataset. The proposed method relies on segmentation and adjustable parameters for anomaly tests. Its effectiveness is rigorously evaluated using metrics such as Precision, Recall, F1-Score, and ROC AUC Score. Additionally, an in-depth analysis of the method’s performance is conducted, considering the prevalence of anomalies and variations in specific model parameters. The methods are applied to a validation dataset with signals derived from HVAC systems. In unsupervised ML, both Standard deviation and Interquartile range methods proved effective for simple signals, but their performance decreased for more complex ones. On the other hand, supervised ML, employing a Support vector machine and Random forest, consistently demonstrated strong performance across both simple and complex signal scenarios. Notably, in specific instances, these metrics achieve a perfect score of 100%.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/17201