Combining supervised machine learning and structured knowledge for difficult perceptual tasks

Klawonn, Matthew
Thumbnail Image
Other Contributors
Hendler, James A.
Fox, Peter A.
Zaki, Mohammed J., 1971-
Ji, Qiang, 1963-
Issue Date
Computer science
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Learning models of visual perception lies at the heart of a number of computer vision problems, including object detection, image description, motion tracking, and more. There are a variety of models which may complete such tasks, though the tasks themselves are usually assumed to be consistent in their requirements: receive visual input, and perceive some desired content in said input. Yet for certain tasks, the desired outputs are very difficult to predict given input images alone. Many perceptual tasks require not only the ability to parse content of a visual scene, but also the ability to combine visual information with auxiliary knowledge to reach conclusions. Rather than attempt to incorporate auxiliary knowledge into the parameters of a learned model, this work presents an alternative approach.
In order to improve compatibility between the scene graph generator and any external knowledge that is available to produce inferences, we also develop a novel meta-learning method for directing the training process of learning algorithms. Specifically, our method learns in an online fashion to select training data for which a given model has good performance. When combined with the scene graph generator, this meta-learning algorithm facilitates a clean split between learned knowledge and external knowledge. The meta-learning algorithm distills information in the training data and in the external knowledge, constructing training scene graphs that are ``learnable", while leaving remaining information to be used during inferencing. We test our approach on a semantic search task, showing that the combination of learned perceptual model, meta-learning algorithm, and structured knowledge inferencing techniques perform better together than they do separately.
We hypothesize that there are significant benefits in training perceptual models such that they can interact successfully with external information, while keeping said information external. Towards validating this hypothesis, we take the following steps. Firstly, we motivate representing external knowledge in a structured, symbolic form, a choice based in the flexibility and expressivity of knowledge representation and reasoning techniques. We then create and evaluate a novel perceptual model that produces scene graphs, an output that can be combined with structured symbolic knowledge to produce complex inferences. Experiments show that this model performs comparably to the state of the art using standard benchmarks and metrics, while holding significant advantages in the flexibility of its training setup and the variety of outputs it can produce.
May 2019
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.