Cross-lingual entity extraction and linking

Authors
Pan, Xiaoman
ORCID
Loading...
Thumbnail Image
Other Contributors
Ji, Heng
Hendler, James A.
McGuinness, Deborah L.
Issue Date
2019-12
Keywords
Computer science
Degree
MS
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Abstract
In this thesis, we propose a Cross-lingual Entity Extraction and Linking framework for fine-grained types and 300 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to extract entity mentions, assign a fine-grained type to each mention, and link it to Wikipedia. We perform a series of new knowledge base mining approaches: generating “silver-standard” entity annotations, transferring annotations from English to other languages through cross-lingual links, refining annotations using self-training, deriving language-specific morphology features from anchor links, and training cross-lingual joint entity and word embedding by generating cross-lingual data which is a mix of entities and contextual words based on Wikipedia. Both entity extraction and linking results are promising on intrinsic Wikipedia data and extrinsic non-Wikipedia data.
Description
December 2019
School of Science
Department
Dept. of Computer Science
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.