Cross-lingual entity extraction and linking

Loading...
Thumbnail Image
Authors
Pan, Xiaoman
Issue Date
2019-12
Type
Electronic thesis
Thesis
Language
ENG
Keywords
Computer science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
In this thesis, we propose a Cross-lingual Entity Extraction and Linking framework for fine-grained types and 300 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to extract entity mentions, assign a fine-grained type to each mention, and link it to Wikipedia. We perform a series of new knowledge base mining approaches: generating “silver-standard” entity annotations, transferring annotations from English to other languages through cross-lingual links, refining annotations using self-training, deriving language-specific morphology features from anchor links, and training cross-lingual joint entity and word embedding by generating cross-lingual data which is a mix of entities and contextual words based on Wikipedia. Both entity extraction and linking results are promising on intrinsic Wikipedia data and extrinsic non-Wikipedia data.
Description
December 2019
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN