A computational approach to lexical semantic shift across time and domain : methods and applications
Loading...
Authors
Gruppi, Mauricio
Issue Date
2022-12
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Computer science
Alternative Title
Abstract
Neural natural language models are designed to learn word and sequence representations from large volumes of text. Such amount of data is typically achieved by merging multiple heterogeneous corpora from the Web.
However, language use is entrenched in the social context it appears, and linguistic variations manifest social differentiation such as ethnicity, gender, sex, and social class.
Words may have their meanings altered based not only on the lexical context but also in the social context they emerge, being associated with the group or community who utilizes them.
These changes are the object of study of computational semantic shift methods, the majority of which are currently designed to handle temporal language change, or linguistic evolution, with little endeavor made towards characterizing changes across domains. In this work, we proposed a method to improve the current semantic shift techniques in cross-domain tasks, and demonstrated its capability in unsupervised feature learning tasks. We focused on addressing the two major challenges of this problem: the assumption of gradual language change used in temporal analysis, and the lack of labeled data for supervised learning.
In particular, we designed a self-supervised learning method to obtain monolingual mappings of words, and showed that it surpasses the performance of state-of-the-art baselines both on over time and cross-domain detection.
Moreover, we designed a framework for the explainability of semantic shifts based on the learned mappings, showing the words that are semantically shifted across input sources, explaining the shift via word representatives and examples in sentence. Finally, we confirmed that semantic shift is able to perform domain differentiation by applying it in a study of scientific news source credibility. The study showed that by using semantic shift in conjunction with citation and copy behavior as measures of concordance of news sources, we could learn representations that capture relevant information about them, such as credibility and political bias, creating clusters of sources that share similar traits.
A qualitative analysis of the observed clusters using semantic shift allowed us to characterize clusters of political conspiracy theorists and sources that propagate pseudoscience/health conspiracy theories.
Description
December 2022
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY