Unsupervised graph-based relation extraction and validation for knowledge base population
dc.rights.license | CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author. | |
dc.contributor | Ji, Heng | |
dc.contributor | Grishman, Ralph | |
dc.contributor | McGuinness, Deborah L. | |
dc.contributor | Fox, Peter A. | |
dc.contributor.author | Yu, Dian | |
dc.date.accessioned | 2021-11-03T08:56:05Z | |
dc.date.available | 2021-11-03T08:56:05Z | |
dc.date.created | 2018-02-21T13:58:37Z | |
dc.date.issued | 2017-12 | |
dc.identifier.uri | https://hdl.handle.net/20.500.13015/2122 | |
dc.description | December 2017 | |
dc.description | School of Science | |
dc.description.abstract | To populate KBs, researchers have made significant progress in relation extraction from unstructured text corpora. However, it remains very challenging since a relation can be expressed in numerous ways through a sophisticated long-range linguistic structure. Previous successful methods require sufficient clean training data, external knowledge bases, or high-quality patterns, which result in extensive human involvement and poor portability to a new relation type or a different language. The consolidation of relations extracted by multiple relation extraction systems from multiple information sources may also generate erroneous, conflicting, redundant or complement results, which are caused by the differences in source trustability and the significant differences in performance among multiple systems. In many cases, certain facts can only be discovered by a minority of advanced systems from a few trustworthy sources. Therefore, it poses a challenge but also an opportunity for KB fact validation. | |
dc.description.abstract | In this thesis, we aim to improve multilingual knowledge base population by designing unsupervised graph-based methods to extract and validate relations from unstructured textual data. We develop language/relation independent methods which can be adapted to a new language/relation with less effort. We want to further improve the performance of a single relation extraction system by incorporating evidence from multiple information sources and multiple systems. To evaluate the effectiveness of our approach, we choose Slot Filling as our evaluation platform, which aims to extract the values of a variety of predefined attributes for a given entity from a large-scale corpus and provide justification sentences to support these values. | |
dc.description.abstract | Knowledge bases (KBs), which store millions of facts about the world, have been widely applied to a broad range of applications such as semantic search and question answering. Each relational fact contains two entities (e.g., person and location) and the relation between them. However, existing KBs are far from complete. Manually updated Wikipedia Infoboxes still serve as the important structured input for many large-scale KBs. Furthermore, completing KBs by inferring missing relations from existing structured data cannot completely solve this problem since KBs mainly focus on famous entities. | |
dc.language.iso | ENG | |
dc.publisher | Rensselaer Polytechnic Institute, Troy, NY | |
dc.relation.ispartof | Rensselaer Theses and Dissertations Online Collection | |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Computer science | |
dc.title | Unsupervised graph-based relation extraction and validation for knowledge base population | |
dc.type | Electronic thesis | |
dc.type | Thesis | |
dc.digitool.pid | 178780 | |
dc.digitool.pid | 178781 | |
dc.digitool.pid | 178782 | |
dc.rights.holder | This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author. | |
dc.description.degree | PhD | |
dc.relation.department | Dept. of Computer Science |
Files in this item
This item appears in the following Collection(s)
-
RPI Theses Online (Complete)
Rensselaer theses from 2006; many restricted to current RPI Students, Faculty and Staff -
RPI Theses Open Access
Rensselaer Theses and Dissertations with Creative Commons Licenses
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.