DNA replication occurs in subnuclear compartments called replication foci where newly synthesized DNA accumulates. The distribution pattern of nascent DNA is regulated in space and time from early to late S phase. During S phase replicative enzymes and accessory proteins are recruited to the foci forming assemblies called Replication Factories, which can be view as hubs coordinating the functional organization of the nucleus throughout S phase. However, it is still unclear if the clustering patterns observed for replicative enzymes and the timing of replication domains could depend on physiological properties of undiscovered replication factors. An expanding body of work has begun to recognize the role of protein/protein and protein/nucleic acids liquid-liquid phase separation (LLPS) in the generation of membranless pseudo-organelles and co-localized bodies. Intrinsically disordered regions (IDR) are often found in proteins that phase separate and can underpin multivalent interactions that drive phase separation. The involvement of LLPS in the mesoscale organization of replication factories is still an open question. Recently, it was described a new type of in vitro liquid droplets whose formation is driven by the IDRs of the Drosophila melanogaster CDT1 and Orc1 subunits of the pre-replication complex (pre-RC). Notably, the Orc1 IDR is conserved in the human orthologue. The identification of the replication proteins containing IDRs could shed a new light on the mesoscale organization of DNA replication and identify the networks of dynamic interactions that underlay the spatial-temporal control of DNA replication. In this context, the aim of my thesis was to catalogue a collection of replicative proteins for the presence and the features of IDRs, to find few candidates to challenge in in vitro assays of LLPS. The results produced a catalogue of replication-related proteins containing IDRs. For each protein it has been reported the length, the position and the theoretical isoelectric point (pI). Interrogating three bioinformatic tools, DISOPRED, IUPred2 and Anchor2, we identified 261 replicative proteins with at least one IDR region ≥30 aa. These proteins represented 60% of the total replicative proteins obtained using the GeneOntology database. The list was further reduced on the basis of the IDR length (≥150 aa), and the resulting set of 100 proteins was classified in different groups for the number and the position of the IDRs. This list includes proteins directly involved in the pre-RC or in the replisome, and factors involved in replication-associated processes such as transcription and cell cycle regulation. Since we are mainly interested in proteins directly engaged in the replication factories, polypeptides with IDRs ≥150 were analysed with STRING, a data base of known and predicted protein-protein interactions. In particular we selected proteins for which a direct association with the pre-RC and the replisome was demonstrated. The output of this analysis was a short list of replicative factors characterized by an extended IDR. Most of these proteins show an IDR in the N-terminal portion with a theoretical pI >7 or <7. The only data in the literature on the LLPS of replicative factors identified IDRs rich in positively charged amino acids as the drivers for the partition. We indicated eight proteins, one for the pre-RC and 7 for the replisome, as best candidates to form droplets in vitro. The presence of several proteins whose IDR is rich in acidic amino acid suggested a model for the assembly of the replicative subnuclear bodies. Based on our model, the high-pI-IDRs would be the driving element for phase separation of protein-DNA droplets, as already described for Orc1, establishing multivalent interactions with DNA. Afterwards, the proteins with acidic IDRs, rich in negatively charged residues, could join the assemblies establishing specific and transient protein-protein interactions.

DNA replication occurs in subnuclear compartments called replication foci where newly synthesized DNA accumulates. The distribution pattern of nascent DNA is regulated in space and time from early to late S phase. During S phase replicative enzymes and accessory proteins are recruited to the foci forming assemblies called Replication Factories, which can be view as hubs coordinating the functional organization of the nucleus throughout S phase. However, it is still unclear if the clustering patterns observed for replicative enzymes and the timing of replication domains could depend on physiological properties of undiscovered replication factors. An expanding body of work has begun to recognize the role of protein/protein and protein/nucleic acids liquid-liquid phase separation (LLPS) in the generation of membranless pseudo-organelles and co-localized bodies. Intrinsically disordered regions (IDR) are often found in proteins that phase separate and can underpin multivalent interactions that drive phase separation. The involvement of LLPS in the mesoscale organization of replication factories is still an open question. Recently, it was described a new type of in vitro liquid droplets whose formation is driven by the IDRs of the Drosophila melanogaster CDT1 and Orc1 subunits of the pre-replication complex (pre-RC). Notably, the Orc1 IDR is conserved in the human orthologue. The identification of the replication proteins containing IDRs could shed a new light on the mesoscale organization of DNA replication and identify the networks of dynamic interactions that underlay the spatial-temporal control of DNA replication. In this context, the aim of my thesis was to catalogue a collection of replicative proteins for the presence and the features of IDRs, to find few candidates to challenge in in vitro assays of LLPS. The results produced a catalogue of replication-related proteins containing IDRs. For each protein it has been reported the length, the position and the theoretical isoelectric point (pI). Interrogating three bioinformatic tools, DISOPRED, IUPred2 and Anchor2, we identified 261 replicative proteins with at least one IDR region ≥30 aa. These proteins represented 60% of the total replicative proteins obtained using the GeneOntology database. The list was further reduced on the basis of the IDR length (≥150 aa), and the resulting set of 100 proteins was classified in different groups for the number and the position of the IDRs. This list includes proteins directly involved in the pre-RC or in the replisome, and factors involved in replication-associated processes such as transcription and cell cycle regulation. Since we are mainly interested in proteins directly engaged in the replication factories, polypeptides with IDRs ≥150 were analysed with STRING, a data base of known and predicted protein-protein interactions. In particular we selected proteins for which a direct association with the pre-RC and the replisome was demonstrated. The output of this analysis was a short list of replicative factors characterized by an extended IDR. Most of these proteins show an IDR in the N-terminal portion with a theoretical pI >7 or <7. The only data in the literature on the LLPS of replicative factors identified IDRs rich in positively charged amino acids as the drivers for the partition. We indicated eight proteins, one for the pre-RC and 7 for the replisome, as best candidates to form droplets in vitro. The presence of several proteins whose IDR is rich in acidic amino acid suggested a model for the assembly of the replicative subnuclear bodies. Based on our model, the high-pI-IDRs would be the driving element for phase separation of protein-DNA droplets, as already described for Orc1, establishing multivalent interactions with DNA. Afterwards, the proteins with acidic IDRs, rich in negatively charged residues, could join the assemblies establishing specific and transient protein-protein interactions.

In silico identification of intrinsically disordered regions (IDRs) in replicative proteins: toward a mesoscale organization of DNA replication

AIUTO, ROSSELLA MARIA
2021/2022

Abstract

DNA replication occurs in subnuclear compartments called replication foci where newly synthesized DNA accumulates. The distribution pattern of nascent DNA is regulated in space and time from early to late S phase. During S phase replicative enzymes and accessory proteins are recruited to the foci forming assemblies called Replication Factories, which can be view as hubs coordinating the functional organization of the nucleus throughout S phase. However, it is still unclear if the clustering patterns observed for replicative enzymes and the timing of replication domains could depend on physiological properties of undiscovered replication factors. An expanding body of work has begun to recognize the role of protein/protein and protein/nucleic acids liquid-liquid phase separation (LLPS) in the generation of membranless pseudo-organelles and co-localized bodies. Intrinsically disordered regions (IDR) are often found in proteins that phase separate and can underpin multivalent interactions that drive phase separation. The involvement of LLPS in the mesoscale organization of replication factories is still an open question. Recently, it was described a new type of in vitro liquid droplets whose formation is driven by the IDRs of the Drosophila melanogaster CDT1 and Orc1 subunits of the pre-replication complex (pre-RC). Notably, the Orc1 IDR is conserved in the human orthologue. The identification of the replication proteins containing IDRs could shed a new light on the mesoscale organization of DNA replication and identify the networks of dynamic interactions that underlay the spatial-temporal control of DNA replication. In this context, the aim of my thesis was to catalogue a collection of replicative proteins for the presence and the features of IDRs, to find few candidates to challenge in in vitro assays of LLPS. The results produced a catalogue of replication-related proteins containing IDRs. For each protein it has been reported the length, the position and the theoretical isoelectric point (pI). Interrogating three bioinformatic tools, DISOPRED, IUPred2 and Anchor2, we identified 261 replicative proteins with at least one IDR region ≥30 aa. These proteins represented 60% of the total replicative proteins obtained using the GeneOntology database. The list was further reduced on the basis of the IDR length (≥150 aa), and the resulting set of 100 proteins was classified in different groups for the number and the position of the IDRs. This list includes proteins directly involved in the pre-RC or in the replisome, and factors involved in replication-associated processes such as transcription and cell cycle regulation. Since we are mainly interested in proteins directly engaged in the replication factories, polypeptides with IDRs ≥150 were analysed with STRING, a data base of known and predicted protein-protein interactions. In particular we selected proteins for which a direct association with the pre-RC and the replisome was demonstrated. The output of this analysis was a short list of replicative factors characterized by an extended IDR. Most of these proteins show an IDR in the N-terminal portion with a theoretical pI >7 or <7. The only data in the literature on the LLPS of replicative factors identified IDRs rich in positively charged amino acids as the drivers for the partition. We indicated eight proteins, one for the pre-RC and 7 for the replisome, as best candidates to form droplets in vitro. The presence of several proteins whose IDR is rich in acidic amino acid suggested a model for the assembly of the replicative subnuclear bodies. Based on our model, the high-pI-IDRs would be the driving element for phase separation of protein-DNA droplets, as already described for Orc1, establishing multivalent interactions with DNA. Afterwards, the proteins with acidic IDRs, rich in negatively charged residues, could join the assemblies establishing specific and transient protein-protein interactions.
2021
In silico identification of intrinsically disordered regions (IDRs) in replicative proteins: toward a mesoscale organization of DNA replication
DNA replication occurs in subnuclear compartments called replication foci where newly synthesized DNA accumulates. The distribution pattern of nascent DNA is regulated in space and time from early to late S phase. During S phase replicative enzymes and accessory proteins are recruited to the foci forming assemblies called Replication Factories, which can be view as hubs coordinating the functional organization of the nucleus throughout S phase. However, it is still unclear if the clustering patterns observed for replicative enzymes and the timing of replication domains could depend on physiological properties of undiscovered replication factors. An expanding body of work has begun to recognize the role of protein/protein and protein/nucleic acids liquid-liquid phase separation (LLPS) in the generation of membranless pseudo-organelles and co-localized bodies. Intrinsically disordered regions (IDR) are often found in proteins that phase separate and can underpin multivalent interactions that drive phase separation. The involvement of LLPS in the mesoscale organization of replication factories is still an open question. Recently, it was described a new type of in vitro liquid droplets whose formation is driven by the IDRs of the Drosophila melanogaster CDT1 and Orc1 subunits of the pre-replication complex (pre-RC). Notably, the Orc1 IDR is conserved in the human orthologue. The identification of the replication proteins containing IDRs could shed a new light on the mesoscale organization of DNA replication and identify the networks of dynamic interactions that underlay the spatial-temporal control of DNA replication. In this context, the aim of my thesis was to catalogue a collection of replicative proteins for the presence and the features of IDRs, to find few candidates to challenge in in vitro assays of LLPS. The results produced a catalogue of replication-related proteins containing IDRs. For each protein it has been reported the length, the position and the theoretical isoelectric point (pI). Interrogating three bioinformatic tools, DISOPRED, IUPred2 and Anchor2, we identified 261 replicative proteins with at least one IDR region ≥30 aa. These proteins represented 60% of the total replicative proteins obtained using the GeneOntology database. The list was further reduced on the basis of the IDR length (≥150 aa), and the resulting set of 100 proteins was classified in different groups for the number and the position of the IDRs. This list includes proteins directly involved in the pre-RC or in the replisome, and factors involved in replication-associated processes such as transcription and cell cycle regulation. Since we are mainly interested in proteins directly engaged in the replication factories, polypeptides with IDRs ≥150 were analysed with STRING, a data base of known and predicted protein-protein interactions. In particular we selected proteins for which a direct association with the pre-RC and the replisome was demonstrated. The output of this analysis was a short list of replicative factors characterized by an extended IDR. Most of these proteins show an IDR in the N-terminal portion with a theoretical pI >7 or <7. The only data in the literature on the LLPS of replicative factors identified IDRs rich in positively charged amino acids as the drivers for the partition. We indicated eight proteins, one for the pre-RC and 7 for the replisome, as best candidates to form droplets in vitro. The presence of several proteins whose IDR is rich in acidic amino acid suggested a model for the assembly of the replicative subnuclear bodies. Based on our model, the high-pI-IDRs would be the driving element for phase separation of protein-DNA droplets, as already described for Orc1, establishing multivalent interactions with DNA. Afterwards, the proteins with acidic IDRs, rich in negatively charged residues, could join the assemblies establishing specific and transient protein-protein interactions.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per contatti: unitesi@unipv.it

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/14907