Curation and completion of knowledge graphs

Loading...
Thumbnail Image
Authors
Dutta, Sharmishtha
Issue Date
2025-08
Type
Electronic thesis
Thesis
Language
en_US
Keywords
Computer science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
Knowledge graphs (KGs) store facts expressed in relationships between entities. Each fact is represented as an ordered triple of \textit{(h,r,t)} or \textit{(head, relation, tail)}. The incompleteness, incorrectness, and sparseness of real-world KGs motivate the task of link prediction or KG completion, where a machine learning model predicts missing entities or relations by learning patterns from a training graph. In this thesis, our contribution focuses on the curation and completion of knowledge graphs, specifically addressing the following challenges. 1. In Chapter 2, we design a novel ontology to capture malware threat intelligence from unstructured text data. The ontology is combined with a framework we developed, TINKER, to instantiate a malware knowledge graph. We show the graph's competency by assessing it using prominent KG completion models.2. In Chapter 3, we consider the inductive KG completion task, where not all entities were seen during training. The most performant models in the inductive setting have employed path encoding modules in addition to subgraph encoding modules. We propose a scalable and efficient alternative to the explicit use of paths. Experimental evaluations on existing inductive KG completion benchmark datasets demonstrate the efficacy of our model. 3. In Chapter 4, we identify gaps in the benchmark datasets and experimental setup of inductive KG completion. We establish a new research direction of realistic and temporally aware inductive knowledge graph completion by providing appropriate datasets and baseline models. Specifically, we propose an algorithm and utilize it to generate datasets that consider temporal progression with varying degrees of entity overlap and independent validation context graphs. We suggest an alternative negative sampling strategy for improved representation learning. Furthermore, we propose two baseline models to address the temporal aspect and entity overlap in the datasets.
Description
August2025
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN
Collections