Neural models for causal information extraction using domain adaptation

Saha, Anik
Thumbnail Image
Other Contributors
Tajer, Ali
Chen, Tianyi
Yener, Bulent
Issue Date
Electrical engineering
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
The task of identifying causality related events or actions from text is an important step in building knowledge graph of events and consequences from the vast amount of unlabeled documents available to us in the digital age. As there has been very little work on this problem, we perform a comparative study on different neural models on 4 different data sets with causal labels. We train sequence tagging and span based models to extract causally related events from their textual description. Our experiments affirm the fact that large pre-trained language models, i.e. BERT, can be fine-tuned on labeled data sets to outperform traditional deep learning models like LSTM. Our results show that span based models are better at classifying spans of words as cause or effect compared to sequence tagging models using the same pre-trained weights from BERT. The length of the spans labeled as causes and effects in a data set also has a significant impact on the advantage of using a span based model. Towards the goal of developing a general purpose model for extraction of causal knowledge from text, we focus on the unsupervised domain adaptation (UDA) scenario where we adapt a model trained on a source domain to a new domain without any label. Several studies on the UDA task for text classification have shown the effectiveness of the adversarial domain adaptation method. We investigate the effect of integrating linguistic information in the adversarial domain adaptation framework for the causal information extraction task. We show the advantage of leveraging the word dependecy relationship in adapting word based neural models like LSTM to new domains. We also find that guiding the adversarial domain classifier to adapt the specific task classifier output is more effective than requiring the encoder model outputs to have similar distributions in two domains.
School of Engineering
Dept. of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.