Author
Pan, Xiaoman
Other Contributors
Ji, Heng; Hendler, James A.; McGuinness, Deborah L.;
Date Issued
2019-12
Subject
Computer science
Degree
MS;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
In this thesis, we propose a Cross-lingual Entity Extraction and Linking framework for fine-grained types and 300 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to extract entity mentions, assign a fine-grained type to each mention, and link it to Wikipedia. We perform a series of new knowledge base mining approaches: generating “silver-standard” entity annotations, transferring annotations from English to other languages through cross-lingual links, refining annotations using self-training, deriving language-specific morphology features from anchor links, and training cross-lingual joint entity and word embedding by generating cross-lingual data which is a mix of entities and contextual words based on Wikipedia. Both entity extraction and linking results are promising on intrinsic Wikipedia data and extrinsic non-Wikipedia data.;
Description
December 2019; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;