Joint information extraction

Li, Qi
Thumbnail Image
Other Contributors
Ji, Heng
Hendler, James A.
Fox, Peter A.
Roth, Dan
Bikel, Daniel M.
Issue Date
Computer science
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Taking entity mention extraction, relation extraction and event extraction as points of view, the main part of this thesis presents a novel sentence-level joint IE framework based on structured prediction and inexact search. In this new framework, the three types of IE components can be simultaneously extracted to alleviate error propagation problem. And we can make use of various global features to produce more accurate and coherent results. Experimental results on the ACE corpora show that our joint model achieves state-of-the-art performance on each stage of the extraction. We further go beyond sentence level and make improvement in cross-document setting. We use an integer-linear-programming (ILP) formulation to conduct cross-document inference so that many spurious results can be effectively filtered out based on the inter-dependencies over the facts from different places. Finally, to investigate the cross-lingual dependencies, we present a CRF-based joint bilingual name tagger for parallel corpora, then demonstrate the application of this method to enhance name-aware machine translation.
May 2015
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.