Abstract. Alternative splicing (AS) includes a very complex tangle of different mechanisms which can strongly modulate and influence almost every cellular process. Among the various types of AS events, wobble splicing (WS) is an interesting process, involving alternative splice sites found at very short distances. It is conceivable that wobble splicing could play major roles in several regulatory pathways, however, only few studies have extensively focused on this topic, limiting its understanding. This work exploits computational approaches for the characterization of splicing junctions involved in wobbling at both donor and acceptor junctions, with particular interest towards non-canonical GC donor splice sites; the analysis includes not only protein coding genes, but also the emerging class of long noncoding RNAs, and considers the genomes of human and mouse. Recent genome annotations provided by the GENCODE consortium were used as the starting point to identify all splice sites. Various bioinformatics techniques have been exploited to perform a comprehensive characterization of all sites and to investigate their frequency and main features. This study highlighted that WS events are quite rare, especially when considering those occurring at 5’ splicing junctions. Additionally, their frequency seems to vary between protein coding and long noncoding RNA genes. Interestingly, it has been discovered that donor splice sites containing the non-canonical GC motifs are particularly abundant in wobble splicing events, suggesting their involvement as important cis-acting elements associated with wobbling. All the results obtained in the human genome were also confirmed in mouse, showing a good degree of conservation. The splicing efficiency of each junction was also calculated as strength scores, to observe whether splice sites that participate in wobbling have different behaviors compared to the totality of splice sites. Remarkably, for both donor and acceptor junctions, splice sites involved in wobbling were often weaker than the corresponding global junctions, suggesting that this type of process could preferentially require junctions that are only loosely recognized by the spliceosome. Noteworthy, differences were also seen in the strength scores of canonical GT and non-canonical GC splice sites- A correlation analysis between the strength variance of splice sites involved in wobbling and their reciprocal distance was also carried out, demonstrating contrasting features of donor and acceptor junctions; these values were negatively correlated in 5’ WS and positively correlated in 3’ WS. Each wobble splicing junction was classified in strength categories, showing that most of them are characterized by rather weak splicing efficiency. When considered as pairs, alternative sites belonging to the same events were reported to be more likely to have similar strength scores. Finally, the effects of 5’ wobbling were investigated with respect to human mRNAs and proteins, to detect possible outcomes of alternative splicing, such as altered transcript stability, frameshifts, insertion of stop codons and loss of entire protein domains. All events were further classified according to the type of induced effect, and further investigations were carried out to assess the functional relevance of such changes. Despite most WS events were predicted to have small effects, a relevant number of events were still expected to have a strong impact. In conclusion, wobble splicing could represent an important regulatory mechanism proper of both protein coding and long noncoding RNA genes. This work successfully provided an updated and genome-wide characterization of splice sites involved in wobbling and answered to multiple open questions. However, various aspects remain elusive and further investigations should follow to better unravel this complex topic.
Abstract. I fenomeni di splicing alternativo includono molti meccanismi complessi in grado di modulare enormemente ed influenzare pressochè ogni processo cellulare. Tra i tipi di splicing alternativo, un sottotipo particolarmente interessante è il “wobble-splicing” (WS), il quale coinvolge siti di splicing estremamente vicini tra loro. È ipotizzabile che tale fenomeno intervenga in molteplici pathways regolativi; tuttavia, solo un numero ristretto di studi si è interessato a questo tema, limitandone la comprensione. Questo lavoro si prefigge di caratterizzare con metodi computazionali le giunzioni di splicing coinvolte in eventi di WS, considerando sia i siti al 5’ che quelli al 3’. Particolare interesse è riservato verso i siti di splicing non-canonici aventi un consenso “GC”. L’analisi ha considerato sia geni “protein-coding” che “long noncoding”, presenti nel genoma di uomo e di topo. Annotazioni genomiche aggiornate sono state ottenute dal consorzio GENCODE ed usate come come punto di partenza per identificare i vari siti di splicing. L’utilizzo di tecniche bioinformatiche ha permesso di svolgere una caratterizzazione completa di ogni giunzione, investigandone la frequenza e gli attributi principali. In base ai risultati ottenuti, gli eventi di WS sembrano essere piuttosto rari, specialmente se si considerano quelli che interessano i siti al 5’. Inoltre, una diversa distribuzione è stata riscontrata tra geni protein-coding e long-noncoding. Una considerazione interessante è che i siti donatori “GC” sono particolarmente abbondanti negli eventi di WS, suggerendo che possano essere dei rilevanti elementi regolativi. Tutti i risultati osservati nel genoma umano sono anche stati confermati in quello murino, dimostrando un buon grado di conservazione. L’efficienza di splicing di ciascun sito è stata inoltre calcolata al fine di stabilire se quelli coinvolti nel WS avessero caratteristiche differenti rispetto alla totalità delle giunzioni. In modo del tutto peculiare, sia per i siti al 5’ che per quelli al 3’, è stato osservato che le giunzioni coinvolte nel WS sono decisamente più deboli rispetto alle corrispettive giunzioni globali, suggerendo che questo tipo di fenomeno richieda preferenzialmente siti che non vengano efficientemente riconosciuti dallo spliceosoma. Differenze degne di nota sono state osservate anche paragonando i punteggi dei siti canonici GT e dei siti non-canonici GC. Un’analisi di correlazione tra distanza reciproca delle giunzioni alternative di WS e la variazione dei loro punteggi ha messo in luce degli aspetti contrastanti tra gli eventi al 5’ e quelli al 3’. In particolare, questi valori sono negativamente correlati nei siti donatori, e positivamente correlati nei siti accettori. Ogni giunzione di WS è stata poi classificata in categorie che ne rappresentassero la forza di splicing, dimostrando che la maggior parte dei siti è debole. Se considerate a coppie, le giunzioni coinvolte nei medesimi eventi sono spesso delinate da punteggi di forza simili. Infine, gli effetti del WS al 5’ sono stati investigati rispetto a mRNA e proteine umane, cercando di stabilire le possibili conseguenze biologiche. I vari eventi sono stati classificati in base al tipo di effetto causato, ed ulteriormente investigati per stabilirne la funzionalità. Nonostante sia stato previsto che molti eventi di wobbling determinino solo lievi cambiamenti, è comunque ipotizzabile che, almeno in alcuni casi, si possano verificare esiti di diverso tipo. Per concludere, il fenomeno di WS potrebbe rappresentare un importante meccanismo di regolazione, sia in geni protein-coding che long noncoding. Questo lavoro fornisce un’aggiornata caratterizzazione genome-wide dei siti coinvolti nel WS, rispondendo inoltre a molte domande. In ogni caso, alcuni aspetti rimangono poco compresi ed altri studi dovrebbero essere svolti per fare chiarezza su questo argomento estremamente complesso.
Approcci bioinformatici per la caratterizzazione di eventi di wobble-splicing nel genoma umano e murino
SALVIATI, LORENZO
2019/2020
Abstract
Abstract. Alternative splicing (AS) includes a very complex tangle of different mechanisms which can strongly modulate and influence almost every cellular process. Among the various types of AS events, wobble splicing (WS) is an interesting process, involving alternative splice sites found at very short distances. It is conceivable that wobble splicing could play major roles in several regulatory pathways, however, only few studies have extensively focused on this topic, limiting its understanding. This work exploits computational approaches for the characterization of splicing junctions involved in wobbling at both donor and acceptor junctions, with particular interest towards non-canonical GC donor splice sites; the analysis includes not only protein coding genes, but also the emerging class of long noncoding RNAs, and considers the genomes of human and mouse. Recent genome annotations provided by the GENCODE consortium were used as the starting point to identify all splice sites. Various bioinformatics techniques have been exploited to perform a comprehensive characterization of all sites and to investigate their frequency and main features. This study highlighted that WS events are quite rare, especially when considering those occurring at 5’ splicing junctions. Additionally, their frequency seems to vary between protein coding and long noncoding RNA genes. Interestingly, it has been discovered that donor splice sites containing the non-canonical GC motifs are particularly abundant in wobble splicing events, suggesting their involvement as important cis-acting elements associated with wobbling. All the results obtained in the human genome were also confirmed in mouse, showing a good degree of conservation. The splicing efficiency of each junction was also calculated as strength scores, to observe whether splice sites that participate in wobbling have different behaviors compared to the totality of splice sites. Remarkably, for both donor and acceptor junctions, splice sites involved in wobbling were often weaker than the corresponding global junctions, suggesting that this type of process could preferentially require junctions that are only loosely recognized by the spliceosome. Noteworthy, differences were also seen in the strength scores of canonical GT and non-canonical GC splice sites- A correlation analysis between the strength variance of splice sites involved in wobbling and their reciprocal distance was also carried out, demonstrating contrasting features of donor and acceptor junctions; these values were negatively correlated in 5’ WS and positively correlated in 3’ WS. Each wobble splicing junction was classified in strength categories, showing that most of them are characterized by rather weak splicing efficiency. When considered as pairs, alternative sites belonging to the same events were reported to be more likely to have similar strength scores. Finally, the effects of 5’ wobbling were investigated with respect to human mRNAs and proteins, to detect possible outcomes of alternative splicing, such as altered transcript stability, frameshifts, insertion of stop codons and loss of entire protein domains. All events were further classified according to the type of induced effect, and further investigations were carried out to assess the functional relevance of such changes. Despite most WS events were predicted to have small effects, a relevant number of events were still expected to have a strong impact. In conclusion, wobble splicing could represent an important regulatory mechanism proper of both protein coding and long noncoding RNA genes. This work successfully provided an updated and genome-wide characterization of splice sites involved in wobbling and answered to multiple open questions. However, various aspects remain elusive and further investigations should follow to better unravel this complex topic.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/11914