Utilizzo del MuZero Reinforcement Learning per l'allocazione degli stand presso l'Aeroporto di Orio al Serio

This thesis presents the development and integration of a reinforcement learning framework for optimizing airport stand allocation, with a specific focus on the turnaround process at Bergamo Orio al Serio Airport (BGY). The work is composed of two main components: an event-driven simulator that accurately reproduces aircraft operations and ground handling processes using realworld flight schedules and stochastic delay models; and an adaptation of the MuZero algorithm, a model-based reinforcement learning method, to interact with and learn from this asynchronous environment. MuZero is a reinforcement learning algorithm developed by DeepMind that achieves superhuman performance by learning a model of the environment without having prior access to its dynamics. While MuZero is typically applied to synchronous, fully observable domains, this project required substantial architectural adaptations for deployment in an event-driven airport simulation environment. A key challenge was the development of the simulator itself from the ground up, designed to accurately model the airport’s operational dynamics. Additional complexities included the integration of MuZero with a dynamic and partially observable state space, the generation of valid actions in real-time, and synchronization between MuZero’s planning process and the simulator’s event-based execution. Training was conducted over multiple self-play iterations using actual operational data from four months of airport activity. The agent’s performance was evaluated through competitive pitting against previous policy versions, showing progressive improvements in efficiency and decision quality. The integration was validated through standalone testing of the simulator and a comprehensive reinforcement learning loop. This research demonstrates the feasibility and potential of applying advanced reinforcement learning algorithms like MuZero to real-world, stochastic, and asynchronous logistics environments. It lays a foundation for future applications in airport operations, logistics, and broader multi-agent coordination scenarios.

Utilizzo del MuZero Reinforcement Learning per l'allocazione degli stand presso l'Aeroporto di Orio al Serio

NURKOO, ASHINA

2023/2024

Abstract

This thesis presents the development and integration of a reinforcement learning framework for optimizing airport stand allocation, with a specific focus on the turnaround process at Bergamo Orio al Serio Airport (BGY). The work is composed of two main components: an event-driven simulator that accurately reproduces aircraft operations and ground handling processes using realworld flight schedules and stochastic delay models; and an adaptation of the MuZero algorithm, a model-based reinforcement learning method, to interact with and learn from this asynchronous environment. MuZero is a reinforcement learning algorithm developed by DeepMind that achieves superhuman performance by learning a model of the environment without having prior access to its dynamics. While MuZero is typically applied to synchronous, fully observable domains, this project required substantial architectural adaptations for deployment in an event-driven airport simulation environment. A key challenge was the development of the simulator itself from the ground up, designed to accurately model the airport’s operational dynamics. Additional complexities included the integration of MuZero with a dynamic and partially observable state space, the generation of valid actions in real-time, and synchronization between MuZero’s planning process and the simulator’s event-based execution. Training was conducted over multiple self-play iterations using actual operational data from four months of airport activity. The agent’s performance was evaluated through competitive pitting against previous policy versions, showing progressive improvements in efficiency and decision quality. The integration was validated through standalone testing of the simulator and a comprehensive reinforcement learning loop. This research demonstrates the feasibility and potential of applying advanced reinforcement learning algorithms like MuZero to real-world, stochastic, and asynchronous logistics environments. It lays a foundation for future applications in airport operations, logistics, and broader multi-agent coordination scenarios.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INDUSTRIALE E DELL'INFORMAZIONE
			
	Corso di studio
	
				COMPUTER ENGINEERING [06415]
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Using the MuZero Reinforcement Learning for Stand Allocation at the Orio al Serio Airport
			
	Relatore
	
				PIASTRA, MARCO
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
NurkooAshinaThesis.pdf accesso aperto Descrizione: This thesis presents the development and integration of MuZero, a reinforcement learning framework, for optimizing airport stand allocation, with a specific focus on the turnaround process at Bergamo Orio al Serio Airport (BGY). Dimensione 2.45 MB Formato Adobe PDF Visualizza/Apri	2.45 MB	Adobe PDF	Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: [email protected].

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/33416