Neural models for causal information extraction using domain adaptation

Authors
Saha, Anik
ORCID
Loading...
Thumbnail Image
Other Contributors
Tajer, Ali
Chen, Tianyi
Yener, Bulent
Issue Date
2023-08
Keywords
Electrical engineering
Degree
MS
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Abstract
The task of identifying causality related events or actions from text is an important step in building knowledge graph of events and consequences from the vast amount of unlabeled documents available to us in the digital age. As there has been very little work on this problem, we perform a comparative study on different neural models on 4 different data sets with causal labels. We train sequence tagging and span based models to extract causally related events from their textual description. Our experiments affirm the fact that large pre-trained language models, i.e. BERT, can be fine-tuned on labeled data sets to outperform traditional deep learning models like LSTM. Our results show that span based models are better at classifying spans of words as cause or effect compared to sequence tagging models using the same pre-trained weights from BERT. The length of the spans labeled as causes and effects in a data set also has a significant impact on the advantage of using a span based model. Towards the goal of developing a general purpose model for extraction of causal knowledge from text, we focus on the unsupervised domain adaptation (UDA) scenario where we adapt a model trained on a source domain to a new domain without any label. Several studies on the UDA task for text classification have shown the effectiveness of the adversarial domain adaptation method. We investigate the effect of integrating linguistic information in the adversarial domain adaptation framework for the causal information extraction task. We show the advantage of leveraging the word dependecy relationship in adapting word based neural models like LSTM to new domains. We also find that guiding the adversarial domain classifier to adapt the specific task classifier output is more effective than requiring the encoder model outputs to have similar distributions in two domains.
Description
August2023
School of Engineering
Department
Dept. of Electrical, Computer, and Systems Engineering
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection
Access
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.
Collections