Alternative splicing (AS) is an important co-transcriptional mechanism, involved in the regulation of gene expression, which governs different cellular processes, both in physiological and pathological conditions. Wobble splicing (WS) is a particular type of AS that occurs at alternative splice sites, separated by a short distance. Like other types of AS, WS also may perform a regulatory function, but knowledge of this topic is still limited. This analysis provides a computational approach for the characterization of tandem splice sites involved in wobble splicing at donor junction, focusing in particular on non-canonical GC site, in human and mouse genomes, whose annotations were provided by the GENCODE consortium. Two classes of genes were involved into the study: protein coding genes (PCGs) and long non-coding RNA (lncRNA) genes. Through several bioinformatics approaches, it was possible to obtain a characterization of the wobble splicing sites and to study their features, such as frequency and prevalence. The total number of 5’ WS events was 1090 for PCGs and 253 for lncRNAs in human, that rapresent 2.2% of all AS events. Wobble splicing events occur with variable frequency among PCGs and lncRNA genes: 5,12% out of all PCGs and 1,36% out of all lncRNAs are involved in wobbling thus making wobble splicing a rare event. This trend was confirmed in mouse, allowing therefore to suppose a certain conservation of such events. Moreover, both in human and in mouse, an enrichment of non-canonical GC donor site at wobble sites was observed, suggesting that they may play a regulatory role in this phenomenon. In this analysis, the expression of transcripts invovled in wobbling was also evaluated. In order to do that, expression levels of WS transcripts were quantified and compered among ten different tissues. Expression was quantified with transcript per milion (TPM) method: transcripts were considered significantly expressed when TPM was more than 0.5. It was found that transcripts involved in wobble splicing have lower expression than other types of transcripts. Moreover, it was observed that the expression of pair of transcripts involved in wobble varies depending on the splice sites selected: transcripts having the non-canonical GC motif at donor junction are less expressed than those having the canonical GT site. A still unclear aspect of WS is its tissue-specificity, that was also investigated in this study. Transcripts are defined tissue-specific when found expressed in three or less tissues out of ten taken into analysis. It was found that 19% of WS transcripts are tissue-specific and that this specificity may depend on the donor site selected, GC or GT. This could be determinated by a fine regulation, that involves not only cis-acting elements, but also splicing regulation factors. Wobble splicing have effects also at protein level, indeed it can induce the insertion/deletion of one or few amino acids, and so a shift in protein reading frame, or introduce a premature termination codon. This induces subtle changes in protein sequence and structure, but it increases proteomic diversity. Lastly, wobbling may affect also subcellular localization, both for protein coding transcripts and lncRNAs. It was observed that WS genes having a GC/GT pair at donor site are transcribed into isoforms with differential localization in the cell, nuclear or cytoplasmatic, and that genes having GC/GT rappresented a little fraction of all the genes considered statistically significant (p-value < 0.05) both in PCGs and lncRNAs. In conclusion, wobble splicing seems to play an important role in gene expression regulation, with effects at several levels, in physological and pathological conditions. This study provides an initial characterization of wobble splicing process, but many aspects are still unclear and further investigations are needed to deeply elucidate this mechanism.
Lo splicing alternativo (AS) è un importante meccanismo co-trascrizionale, coinvolto nella regolazione dell’espressione genica, che governa diversi processi cellulari, sia in condizioni fisiologiche che patologiche. Il wobble splicing (WS) è un particolare tipo di splicing alternativo, che avviene in siti di splicing alternativi separati da una breve distanza. Come gli altri tipi di AS, anche il WS svolge una funzione regolatoria, ma le conoscenze su questo argomento sono ancora limitate. Questa analisi presenta diversi approcci computazionali per la caratterizzazione dei siti di splicing in tandem coinvolti nel wobble splicing a livello della giunzione donatrice, concentrandosi in particolare sul sito non canonico GC, nei genomi dell’uomo e del topo, le cui annotazioni sono state fornite dal consorzio GENCODE. Due classi di geni sono state coinvolte nello studio: geni “protein-coding” (PCGs) e “long non-coding” RNA (lncRNAs). Attraverso diversi approcci bioinformatici, è stato possibile ottenere una caratterizzazione dei siti e studiarne le proprietà, come la frequenza e la prevalenza. Il numero totale di eventi di WS nei siti al 5’ era 1090 per PCGs e 253 per lncRNAs nell’uomo, e rappresenta il 2,2% di tutti gli eventi di AS. Gli eventi di WS si verificano con frequenza variabile tra i geni PC e lncRNA: il 5,12% di tutti i PCG e il 1,36% di tutti i lncRNA sono coinvolti nel wobbling. Questa tendenza è stata riscontrata anche nel topo, permettendo quindi di supporre una certa conservazione di tali eventi. Inoltre, sia nell'uomo che nel topo, è stato osservato un arricchimento del sito donatore non canonico GC, suggerendo che questi elementi possano avere un ruolo regolativo in questo meccanismo. Anche l'espressione dei trascritti coinvolti nel wobbling è stata valutata. Per fare ciò, sono stati quantificati i livelli di espressione dei trascritti e comparati tra dieci tessuti diversi. L'espressione è stata quantificata con il metodo della trascrizione per milione (TPM): i trascritti vengono considerati espressi in maniera significativa quando la TPM era superiore a 0,5. I trascritti coinvolti nel wobble splicing hanno un'espressione più bassa rispetto agli altri tipi di trascritti. Inoltre, è stato osservato che l'espressione dei trascritti con il motivo GC non canonico alla giunzione del donatore sono meno espressi di quelli che hanno il sito GT canonico. Un aspetto ancora poco chiaro del WS è la sua specificità tissutale, che è stata analizzata in questo studio. I trascritti sono definiti tessuto-specifici quando sono espressi in tre o meno tessuti sui dieci presi in analisi. È stato trovato che il 19% dei trascritti coinvolti nel wobbling sono tessuto-specifici e che questa specificità può dipendere dal sito donatore selezionato, GC o GT. Ciò potrebbe essere determinato da una regolazione, che coinvolge non solo gli elementi che agiscono in cis, ma anche i fattori di regolazione dello splicing. Il wobble splicing ha effetti anche a livello delle proteine, infatti può indurre l'inserzione/ eliminazione di uno o pochi aminoacidi, e quindi un cambiamento nella fase di lettura delle proteine, o introdurre un codone di terminazione prematuro. Ciò induce cambiamenti sottili nella sequenza e nella struttura della proteina, ma aumenta la diversità proteomica. Infine, il wobbling può influenzare anche la localizzazione subcellulare, sia dei trascritti PC che dei lncRNAs. È stato osservato che i geni coinvolti nel WS con una coppia GC/GT nel sito del donatore vengono trascritti in isoforme con localizzazione differenziale nella cellula, nucleare o citoplasmatica, e che i geni con coppie GC/GT rappresentano una piccola frazione di tutti i geni considerati statisticamente significativi (valore p 0,05) sia nei geni PC che nei lncRNA. Questo studio fornisce una prima caratterizzazione del processo di wobble splicing, ma molti aspetti sono ancora poco chiari e sono necessari ulteriori studi.
Bioinformatics analysis of the expression of transcripts involved in wobble splicing. (Analisi bioinformatiche dell'espressione dei trascritti coinvolti nel wobble splicing)
NICASTRO, ARIANNA
2019/2020
Abstract
Alternative splicing (AS) is an important co-transcriptional mechanism, involved in the regulation of gene expression, which governs different cellular processes, both in physiological and pathological conditions. Wobble splicing (WS) is a particular type of AS that occurs at alternative splice sites, separated by a short distance. Like other types of AS, WS also may perform a regulatory function, but knowledge of this topic is still limited. This analysis provides a computational approach for the characterization of tandem splice sites involved in wobble splicing at donor junction, focusing in particular on non-canonical GC site, in human and mouse genomes, whose annotations were provided by the GENCODE consortium. Two classes of genes were involved into the study: protein coding genes (PCGs) and long non-coding RNA (lncRNA) genes. Through several bioinformatics approaches, it was possible to obtain a characterization of the wobble splicing sites and to study their features, such as frequency and prevalence. The total number of 5’ WS events was 1090 for PCGs and 253 for lncRNAs in human, that rapresent 2.2% of all AS events. Wobble splicing events occur with variable frequency among PCGs and lncRNA genes: 5,12% out of all PCGs and 1,36% out of all lncRNAs are involved in wobbling thus making wobble splicing a rare event. This trend was confirmed in mouse, allowing therefore to suppose a certain conservation of such events. Moreover, both in human and in mouse, an enrichment of non-canonical GC donor site at wobble sites was observed, suggesting that they may play a regulatory role in this phenomenon. In this analysis, the expression of transcripts invovled in wobbling was also evaluated. In order to do that, expression levels of WS transcripts were quantified and compered among ten different tissues. Expression was quantified with transcript per milion (TPM) method: transcripts were considered significantly expressed when TPM was more than 0.5. It was found that transcripts involved in wobble splicing have lower expression than other types of transcripts. Moreover, it was observed that the expression of pair of transcripts involved in wobble varies depending on the splice sites selected: transcripts having the non-canonical GC motif at donor junction are less expressed than those having the canonical GT site. A still unclear aspect of WS is its tissue-specificity, that was also investigated in this study. Transcripts are defined tissue-specific when found expressed in three or less tissues out of ten taken into analysis. It was found that 19% of WS transcripts are tissue-specific and that this specificity may depend on the donor site selected, GC or GT. This could be determinated by a fine regulation, that involves not only cis-acting elements, but also splicing regulation factors. Wobble splicing have effects also at protein level, indeed it can induce the insertion/deletion of one or few amino acids, and so a shift in protein reading frame, or introduce a premature termination codon. This induces subtle changes in protein sequence and structure, but it increases proteomic diversity. Lastly, wobbling may affect also subcellular localization, both for protein coding transcripts and lncRNAs. It was observed that WS genes having a GC/GT pair at donor site are transcribed into isoforms with differential localization in the cell, nuclear or cytoplasmatic, and that genes having GC/GT rappresented a little fraction of all the genes considered statistically significant (p-value < 0.05) both in PCGs and lncRNAs. In conclusion, wobble splicing seems to play an important role in gene expression regulation, with effects at several levels, in physological and pathological conditions. This study provides an initial characterization of wobble splicing process, but many aspects are still unclear and further investigations are needed to deeply elucidate this mechanism.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/12156