Single-cell RNA sequencing (scRNA-seq) data analysis enables the study of gene expression at the single-cell level, offering new opportunities to understand cellular heterogeneity in complex biological systems. However, these data are frequently affected by batch effects; non-biological technical variations introduced by differences between experiments, sequencing technologies, or laboratory conditions. Such effects can compromise the biological interpretation of the data and hinder the integration of datasets from different experiments. This thesis addresses the problem of batch effect correction, with the ultimate goal of extending and enriching the functionalities of the open-source software Orange within the field of scRNA-seq data analysis. Following a comparative analysis of several state-of-the-art methods, Harmony was selected for its robust performance in terms of batch integration, preservation of biological information, and computational efficiency. The algorithm was integrated into the Orange Single Cell Add-On through the development of a new dedicated widget for batch effect correction. Furthermore, a second widget, named Batch Correction Evaluation, was designed and implemented to provide a quantitative assessment of correction quality on processed datasets, or to evaluate the impact of batch effects on raw data. This work was carried out during an Erasmus Traineeship at Professor Blaž Zupan’s laboratory, within the Department of Computer and Information Science at the University of Ljubljana, contributing to the expansion of the Orange software ecosystem.
L’analisi dei dati di single-cell RNA sequencing (scRNA-seq) consente di studiare l’espressione genica a livello di singola cellula, offrendo nuove opportunità per la comprensione dell’eterogeneità cellulare in sistemi biologici complessi. Tuttavia, questi dati sono frequentemente affetti da effetti di batch, ovvero variazioni tecniche non biologiche introdotte da differenze tra esperimenti, tecnologie di sequenziamento o condizioni di laboratorio. Tali effetti possono compromettere l’interpretazione biologica dei dati e rendere difficoltosa l’integrazione di dataset provenienti da esperimenti diversi. In questa tesi viene affrontato il problema della correzione dei batch effects, con l'obiettivo finale di estendere ed arricchire le funzionalità del software open-source Orange in ambito di analisi di dati scRNA-seq. Dopo un’analisi comparativa di diversi metodi presenti in letteratura, è stato selezionato il metodo Harmony per le sue buone prestazioni in termini di integrazione dei batch, preservazione dell’informazione biologica ed efficienza computazionale. L’algoritmo è stato implementato all’interno dell’Add-On Single Cell del software Orange mediante lo sviluppo di un nuovo widget dedicato alla correzione dei batch effects. Inoltre, è stato progettato e implementato un secondo widget, denominato Batch Correction Evaluation, finalizzato alla valutazione quantitativa della qualità della correzione su dataset corretti o dell'impatto dei batch effects su dati non ancora corretti. Il lavoro è stato svolto nell’ambito di un periodo di Erasmus Traineeship presso il laboratorio del Prof. Blaž Zupan, nel dipartimento di Computer and Information Science dell’Università di Lubiana, contribuendo allo sviluppo dell’ecosistema software Orange.
Progetto e sviluppo di strumenti interattivi in Orange per la correzione dei batch effect in scRNA-seq
GOTTINGER, MICHELE
2024/2025
Abstract
Single-cell RNA sequencing (scRNA-seq) data analysis enables the study of gene expression at the single-cell level, offering new opportunities to understand cellular heterogeneity in complex biological systems. However, these data are frequently affected by batch effects; non-biological technical variations introduced by differences between experiments, sequencing technologies, or laboratory conditions. Such effects can compromise the biological interpretation of the data and hinder the integration of datasets from different experiments. This thesis addresses the problem of batch effect correction, with the ultimate goal of extending and enriching the functionalities of the open-source software Orange within the field of scRNA-seq data analysis. Following a comparative analysis of several state-of-the-art methods, Harmony was selected for its robust performance in terms of batch integration, preservation of biological information, and computational efficiency. The algorithm was integrated into the Orange Single Cell Add-On through the development of a new dedicated widget for batch effect correction. Furthermore, a second widget, named Batch Correction Evaluation, was designed and implemented to provide a quantitative assessment of correction quality on processed datasets, or to evaluate the impact of batch effects on raw data. This work was carried out during an Erasmus Traineeship at Professor Blaž Zupan’s laboratory, within the Department of Computer and Information Science at the University of Ljubljana, contributing to the expansion of the Orange software ecosystem.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi.pdf
accesso aperto
Dimensione
4.23 MB
Formato
Adobe PDF
|
4.23 MB | Adobe PDF | Visualizza/Apri |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/35066