Author
Liu, Yue
Other Contributors
McGuinness, Deborah L.; Hendler, James A.; Ji, Heng; Hahn, Juergen;
Date Issued
2019-08
Subject
Computer science
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
In our second contribution, we study the knowledge extraction in a structured format from unstructured text. We focus on RDF triple extraction from unstructured text with the resulting triples being compatible with a given vocabulary, enabling knowledge to be represented in a machine-readable way. Inspired by neural machine translation, we propose an end-to-end model that extracts facts in the form of RDF triples directly from unstructured text. Our model outperforms the state-of-the-art pipeline based methods including entity linking and relation extraction on three different kinds of data sets. We demonstrate that the use of Knowledge Graph (KG) Embeddings in Machine Learning models have the potential to bridge the gap between natural language text and graph nodes, thus improving overall performance.; Our third contribution to this thesis is a data-driven approach for bottom-up ontology generation with a specific objective of learning a harmonized catalog of items. In this task, we study a harmonized product catalog ontology in the area of eCommerce, where we have 730,901 E-Commerce product data scraped from the web. With an objective to improve categorization and faceted search, we propose effective approaches to build the ontology through a series of tasks including attribute extraction, product encoding and concept hierarchy generation with machine learning and natural language processing techniques. We demonstrate that the generated harmonized product catalog ontology, using our data-driven approach, can be used to make better product categorization and faceted search based on industry provided criteria.; In the first area, we study the extraction and linking of ontology classes from unstructured text, which aims to enhance the automation of ontology classes. In the second area, we study the knowledge extraction in a structured format from unstructured text, which aims to enhance the automation of ontology classes, relations, attributes as a whole. In the third area, we study the automatic ontology generation based on components like classes, relations, attributes we automatically learned from raw data. In this thesis, we contribute robust approaches with novel applications of Machine Learning and Natural Language Processing techniques to address three representative tasks in all three areas we studied that are closely related to the process of ontology automation. In the first contribution, in which we study the extraction and linking of ontology classes from unstructured text, we present a method for extracting noun phrases in the text that can be ontology classes and linking those to existing appropriate ontologies. Instead of defining new ontology terms, We demonstrate the method by linking the 3000 noun phrases extracted from biomedical and clinical literature to terms in existing biomedical ontologies with an innovative application of the word embedding technique. We demonstrate that training with a task-oriented resource can achieve state-of-the-art performance on a variety of tasks, such as abbreviation expansion and synonym detection, especially on domain-specific texts, such as clinical notes and an electronic health record (EHR). Our proposed approach has been adopted by many external researchers in the area and we are able to enrich over 1000 classes in 112 biomedical ontologies.; Ontologies provide terms to describe and represent specific knowledge and are thus widely used in many semantic web applications for knowledge management purposes. Since creating ontologies manually can be extremely labor-intensive and time-consuming, there is an increase in the motivation to automate the process, which includes the automation of ontology components such as classes, relations, attributes, and the overall term coverage and structure. Towards the goal of ontology automation in semantic web, in this thesis, we focus on the portion of ”Ontology Learning” that seeks automatic or semi-automatic approaches for either creating or reusing existing ontology resources with certain task-oriented objectives. We aim to enhance aspects of ontology learning with methods that can be reused in different domains and applied at large scale using machine learning and natural language processing techniques. We consider classes, relations, attributes being three major components of ontology and study three areas of ontology learning with respect to the automation of these components.;
Description
August 2019; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;