Knowledge base construction from scientific literature

Loading...
Thumbnail Image
Authors
Wang, Han
Issue Date
2016-12
Type
Electronic thesis
Thesis
Language
ENG
Keywords
Multidisciplinary science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
Knowledge Bases (KBs) have become a functional utility as a repository of information for both humans and software agents to seek confirmed facts about the world. With the wide-ranging application of KBs, automatically constructing either generic KBs or domain-specific KBs using information extracted from multiple sources such as web pages, reports, and research papers has grown into an interesting task for both academia and industry.
SciKB adopts an open information extraction approach to extract fact triples from the input documents, then jointly learns the distributed representations of the involved entities and relations in an unsupervised fashion, and finally utilizes the obtained representations to organize the entities and relations into hierarchical clusters. Experiments are conducted to evaluate each component of the SciKB pipeline and the results demonstrate its effectiveness in two scientific domains: Biomedical Science and Earth Science.
This dissertation presents SciKB, an end-to-end Knowledge Base Construction system, which takes in a collection of research articles within a certain scientific domain and outputs a domain-specific KB. The resultant KB contains fact triples extracted from the input documents as well as hierarchical clusters of the entities and relations involved in the facts. Each cluster aggregates entities or relations with similar semantic meanings, and the hierarchies serve as an implicit schema of the KB.
Description
December 2016
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN