The COVID-19 pandemic was an unexpected disaster for the world. Governments and people were not prepared to face the consequences, the health system could not bear the impact due to lack of places in hospital and in intensive care. Obviously, the impact was also economic, and all this led the populations to adapt to a very different lifestyle. The role of data in dealing with such an unexpected event was crucial. Unfortunately, as a pandemic is a fairly rare event, there were no comprehensive data collections, consequently several projects were born, designed to collect a large amount, and then use them to help governors in making their decisions. The PERISCOPE project is one of them, and it aims to provide information, predictive models and WebGIS Atlas, in order to understand the dynamics and consequences of the pandemic: Chapter 3 describes the project in detail, with a focus on the technologies used. My master’s degree thesis aims to address three themes in particular, useful for continuing the development of the project. First of all, the Extraction, Transform and Load (ETL) task of 3 datasets from as many different sources, was done. They have been imported into the architecture of the PERISCOPE project, in particular from the Data Lake into the Data Warehouse. This topic is addressed in the Chapter 4, where the structure of the Data Warehouse is described, with relative description of the most significant tables, and then moves on to the description of the raw datasets and the actual transformation of them. All data has been collected, studied and organized in order to be inserted in the best possible way within the Data Warehouse. The tools I used for this part are AWS Glue and AWS lue Studio. We then continue with the Chapter 5, where the implementation, through AWS EKS, of the Statistical Analysis Sandbox is described in detail. First, the reasons that led us to deploy such an environment and what is its specific role within the PERISCOPE architecture are described. It then continues with a clarification on the various choices made during the configuration phase, ending with a presentation of all the features that are present. The environment built is perfectly suitable for all the Data Scientists, statisticians etc. that work in the PERISCOPE project. Subsequently, in Chapter 6, the potential of the environment is described and applied to the Analysis of mental healthcare data. In particular, the data on the unique cases of anxiety, in the populations of 4 states, were crossed with other data sources, in order to observe recurrent patterns between the different countries. The analysis led us to observe an increase in cases of anxiety in the pandemic period, compared to the previous one. In addition, the decisions of the rulers also seem to have had an impact on their growth. Finally, the conclusions of the work in Chapter 7 are drawn, with relative looks to the future.
Designing and Implementing a Cloud Analytics solution in the context of the Periscope Project
CURTONI, ANDREA
2020/2021
Abstract
The COVID-19 pandemic was an unexpected disaster for the world. Governments and people were not prepared to face the consequences, the health system could not bear the impact due to lack of places in hospital and in intensive care. Obviously, the impact was also economic, and all this led the populations to adapt to a very different lifestyle. The role of data in dealing with such an unexpected event was crucial. Unfortunately, as a pandemic is a fairly rare event, there were no comprehensive data collections, consequently several projects were born, designed to collect a large amount, and then use them to help governors in making their decisions. The PERISCOPE project is one of them, and it aims to provide information, predictive models and WebGIS Atlas, in order to understand the dynamics and consequences of the pandemic: Chapter 3 describes the project in detail, with a focus on the technologies used. My master’s degree thesis aims to address three themes in particular, useful for continuing the development of the project. First of all, the Extraction, Transform and Load (ETL) task of 3 datasets from as many different sources, was done. They have been imported into the architecture of the PERISCOPE project, in particular from the Data Lake into the Data Warehouse. This topic is addressed in the Chapter 4, where the structure of the Data Warehouse is described, with relative description of the most significant tables, and then moves on to the description of the raw datasets and the actual transformation of them. All data has been collected, studied and organized in order to be inserted in the best possible way within the Data Warehouse. The tools I used for this part are AWS Glue and AWS lue Studio. We then continue with the Chapter 5, where the implementation, through AWS EKS, of the Statistical Analysis Sandbox is described in detail. First, the reasons that led us to deploy such an environment and what is its specific role within the PERISCOPE architecture are described. It then continues with a clarification on the various choices made during the configuration phase, ending with a presentation of all the features that are present. The environment built is perfectly suitable for all the Data Scientists, statisticians etc. that work in the PERISCOPE project. Subsequently, in Chapter 6, the potential of the environment is described and applied to the Analysis of mental healthcare data. In particular, the data on the unique cases of anxiety, in the populations of 4 states, were crossed with other data sources, in order to observe recurrent patterns between the different countries. The analysis led us to observe an increase in cases of anxiety in the pandemic period, compared to the previous one. In addition, the decisions of the rulers also seem to have had an impact on their growth. Finally, the conclusions of the work in Chapter 7 are drawn, with relative looks to the future.È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/14703