Near-Infrared Spectroscopy (NIRS) is a cornerstone for quality discrimination in the pharmaceutical industry, yet the high cost of laboratory equipment often lim- its its widespread field application [6, 14]. This thesis investigates the feasibility of replacing expensive, high-resolution instruments (INSTR1) with a discrete, low- cost portable prototype (INSTR2) for the classification of pharmaceutical-grade petrolatum. The central research question explores whether advanced computa- tional pipelines can effectively compensate for the physical limitations and hardware discrepancies of a lower-resolution sensor. To address this, an experimental campaign was conducted, evaluating a pipeline of 23 different architectures ranging from traditional statistical classifiers to modern Deep Learning (DL) models, including Transformers and 1D-Convolutional Neural Networks (CNNs). A critical focus was placed on evaluating how algorithmic stabil- ity and execution time are affected by sample rotation noise and spectral resolution. The results reveal a Resolution Paradox: the high-resolution data from INSTR1 introduces significant multicollinearity and background noise, making generalization difficult for most architectures. Conversely, the discrete, 27-wavelength signature of INSTR2 proved to be intrinsically linearly separable, allowing a wide variety of models to achieve near-perfect classification. Addressing the problem of algorithmic efficiency for industrial edge computing revealed that heavy architectures are fundamentally over-engineered for this task. The “Gold Standard” emerged as Linear Discriminant Analysis (LDA). By maxi- mizing class separation, LDA achieved a deterministic 100.0% accuracy with zero variance on both instruments. Most importantly, it achieved this perfection directly on raw spectral data, completely eliminating the need for Principal Component Analysis (PCA) dimensionality reduction and minimizing computational overhead. These findings demonstrate that a rigorous mathematical foundation can perfectly bridge the gap between low-cost hardware and laboratory-grade reliability, enabling real-time quality control.
Near-Infrared Spectroscopy (NIRS) is a cornerstone for quality discrimination in the pharmaceutical industry, yet the high cost of laboratory equipment often lim- its its widespread field application [6, 14]. This thesis investigates the feasibility of replacing expensive, high-resolution instruments (INSTR1) with a discrete, low- cost portable prototype (INSTR2) for the classification of pharmaceutical-grade petrolatum. The central research question explores whether advanced computa- tional pipelines can effectively compensate for the physical limitations and hardware discrepancies of a lower-resolution sensor. To address this, an experimental campaign was conducted, evaluating a pipeline of 23 different architectures ranging from traditional statistical classifiers to modern Deep Learning (DL) models, including Transformers and 1D-Convolutional Neural Networks (CNNs). A critical focus was placed on evaluating how algorithmic stabil- ity and execution time are affected by sample rotation noise and spectral resolution. The results reveal a Resolution Paradox: the high-resolution data from INSTR1 introduces significant multicollinearity and background noise, making generalization difficult for most architectures. Conversely, the discrete, 27-wavelength signature of INSTR2 proved to be intrinsically linearly separable, allowing a wide variety of models to achieve near-perfect classification. Addressing the problem of algorithmic efficiency for industrial edge computing revealed that heavy architectures are fundamentally over-engineered for this task. The “Gold Standard” emerged as Linear Discriminant Analysis (LDA). By maxi- mizing class separation, LDA achieved a deterministic 100.0% accuracy with zero variance on both instruments. Most importantly, it achieved this perfection directly on raw spectral data, completely eliminating the need for Principal Component Analysis (PCA) dimensionality reduction and minimizing computational overhead. These findings demonstrate that a rigorous mathematical foundation can perfectly bridge the gap between low-cost hardware and laboratory-grade reliability, enabling real-time quality control.
Deep Learning-Enhanced Near-Infrared Spectroscopy for Cross-Instrumental Petrolatum Discrimination
GATTI, ASIA
2025/2026
Abstract
Near-Infrared Spectroscopy (NIRS) is a cornerstone for quality discrimination in the pharmaceutical industry, yet the high cost of laboratory equipment often lim- its its widespread field application [6, 14]. This thesis investigates the feasibility of replacing expensive, high-resolution instruments (INSTR1) with a discrete, low- cost portable prototype (INSTR2) for the classification of pharmaceutical-grade petrolatum. The central research question explores whether advanced computa- tional pipelines can effectively compensate for the physical limitations and hardware discrepancies of a lower-resolution sensor. To address this, an experimental campaign was conducted, evaluating a pipeline of 23 different architectures ranging from traditional statistical classifiers to modern Deep Learning (DL) models, including Transformers and 1D-Convolutional Neural Networks (CNNs). A critical focus was placed on evaluating how algorithmic stabil- ity and execution time are affected by sample rotation noise and spectral resolution. The results reveal a Resolution Paradox: the high-resolution data from INSTR1 introduces significant multicollinearity and background noise, making generalization difficult for most architectures. Conversely, the discrete, 27-wavelength signature of INSTR2 proved to be intrinsically linearly separable, allowing a wide variety of models to achieve near-perfect classification. Addressing the problem of algorithmic efficiency for industrial edge computing revealed that heavy architectures are fundamentally over-engineered for this task. The “Gold Standard” emerged as Linear Discriminant Analysis (LDA). By maxi- mizing class separation, LDA achieved a deterministic 100.0% accuracy with zero variance on both instruments. Most importantly, it achieved this perfection directly on raw spectral data, completely eliminating the need for Principal Component Analysis (PCA) dimensionality reduction and minimizing computational overhead. These findings demonstrate that a rigorous mathematical foundation can perfectly bridge the gap between low-cost hardware and laboratory-grade reliability, enabling real-time quality control.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis.pdf
embargo fino al 02/11/2026
Dimensione
5.69 MB
Formato
Adobe PDF
|
5.69 MB | Adobe PDF | Richiedi una copia |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: [email protected].
https://hdl.handle.net/20.500.14239/34975