Analyzing Jailbreaking in LLMs Through Pragmatics and Bias Studies

This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.

Analyzing Jailbreaking in LLMs Through Pragmatics and Bias Studies

TORCHIO, FRANCESCA

2023/2024

Abstract

This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI STUDI UMANISTICI
			
	Corso di studio
	
				LINGUISTICA TEORICA, APPLICATA E DELLE LINGUE MODERNE [05409]
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Analyzing Jailbreaking in LLMs Through Pragmatics and Bias Studies
			
	Abstract in italiano
	
				This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.
			
	Relatore
	
				ZANCHI, CHIARA
			
	Correlatore
	
				COMBEI, CLAUDIA ROBERTA
			
	Appare nelle tipologie:
	
				Lauree Magistrali

File in questo prodotto:

File	Dimensione	Formato
Torchio_Tesi.pdf accesso aperto Dimensione 7.88 MB Formato Adobe PDF Visualizza/Apri	7.88 MB	Adobe PDF	Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/26344