Animacy is widely recognized in experimental and theoretical literature as a fundamental category in language processing and production. Linguistic research has primarily focused on grammaticalized animacy in languages that encode it with explicit morphosyntactic marking. This study instead investigates animacy as an inherent extra-linguistic property (Comrie, 1989), grounded in biological and ontological distinctions (semantic animacy). Despite its importance, no large-scale quantitative analysis has yet been conducted to examine the interaction between animacy and the syntactic-semantic level. To address this gap, the thesis first introduces a pipeline for multilingual animacy classification, leveraging fine-tuned transformer-based Language Models and Word Sense Disambiguation datasets (XL-WSD, Pasini et al., 2021). The proposed approach achieves a macro F1 score exceeding 0.96 across ten languages, classifying animacy into three categories: Human, Animate, and Inanimate. Next, using Universal Dependencies Treebanks enriched with machine-inferred animacy labels, a proof-of-concept analysis is provided to explore animacy’s influence on syntax. The findings suggest that animacy functions as a “soft constraint” (Bresnan et al., 2001), influencing grammatical role assignment and verb selection patterns. Data-driven and information-theoretic analyses reveal relevant correlations between animacy and subject/object roles, as well as a reduction in syntactic entropy in the selection of verb arguments. Finally, building on Ji & Liang (2018), this study demonstrates the feasibility of deriving a three-tiered animacy hierarchy solely from the co-occurrence patterns between verbs and their associated human subjects.
Parameter-Efficient Fine-Tuning for Multilingual Animacy Classification: Probing Syntax-Semantics Interactions
GAY, MATTEO
2023/2024
Abstract
Animacy is widely recognized in experimental and theoretical literature as a fundamental category in language processing and production. Linguistic research has primarily focused on grammaticalized animacy in languages that encode it with explicit morphosyntactic marking. This study instead investigates animacy as an inherent extra-linguistic property (Comrie, 1989), grounded in biological and ontological distinctions (semantic animacy). Despite its importance, no large-scale quantitative analysis has yet been conducted to examine the interaction between animacy and the syntactic-semantic level. To address this gap, the thesis first introduces a pipeline for multilingual animacy classification, leveraging fine-tuned transformer-based Language Models and Word Sense Disambiguation datasets (XL-WSD, Pasini et al., 2021). The proposed approach achieves a macro F1 score exceeding 0.96 across ten languages, classifying animacy into three categories: Human, Animate, and Inanimate. Next, using Universal Dependencies Treebanks enriched with machine-inferred animacy labels, a proof-of-concept analysis is provided to explore animacy’s influence on syntax. The findings suggest that animacy functions as a “soft constraint” (Bresnan et al., 2001), influencing grammatical role assignment and verb selection patterns. Data-driven and information-theoretic analyses reveal relevant correlations between animacy and subject/object roles, as well as a reduction in syntactic entropy in the selection of verb arguments. Finally, building on Ji & Liang (2018), this study demonstrates the feasibility of deriving a three-tiered animacy hierarchy solely from the co-occurrence patterns between verbs and their associated human subjects.File | Dimensione | Formato | |
---|---|---|---|
Gay_LM_thesis_pdfA.pdf
non disponibili
Dimensione
2.46 MB
Formato
Adobe PDF
|
2.46 MB | Adobe PDF | Richiedi una copia |
È consentito all'utente scaricare e condividere i documenti disponibili a testo pieno in UNITESI UNIPV nel rispetto della licenza Creative Commons del tipo CC BY NC ND.
Per maggiori informazioni e per verifiche sull'eventuale disponibilità del file scrivere a: unitesi@unipv.it.
https://hdl.handle.net/20.500.14239/27789