This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.

This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.

Analyzing Jailbreaking in LLMs Through Pragmatics and Bias Studies

TORCHIO, FRANCESCA
2023/2024

Abstract

This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.
2023
Analyzing Jailbreaking in LLMs Through Pragmatics and Bias Studies
This work focuses on Large Language Models (LLMs) and some issues they pose. In particular, we delve into the phenomenon of jailbreaking, which refers to the ability to bypass LLMs’ restrictions and elicit problematic content through specific prompts. Jailbreaking is examined from various perspectives: first, we review existing literature categorizing jailbreaking prompts; second, we compare the literature on jailbreaking with the literature on bias in Natural Language Processing (NLP); third, we provide a pragmatic analysis of jailbreaking based on human deception strategies.
File in questo prodotto:
File Dimensione Formato  
Torchio_Tesi.pdf

accesso aperto

Dimensione 7.88 MB
Formato Adobe PDF
7.88 MB Adobe PDF Visualizza/Apri

È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14239/26344