Neural semantic structural modeling for complex question answering tasks

Mou, Xiangyang
Thumbnail Image
Other Contributors
Franklin, W. Randolph
Chen, Tianyi
Yu, Mo
Ji, Qiang, 1963-
Su, Hui
Issue Date
Computer Systems engineering
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Question answering (QA) is a task in which an automatic system is built to answer questions given a particular context in a certain format. Over the past few years, considerable progress has been made on QA systems with the advancement of neural networks in natural language processing (NLP). On one hand, researchers have attained human-level performance in some relatively simple reading comprehension tasks in which the context is short and the exact answer to the question can be found in the context. On the other hand, the systems still struggle with complex questions that can only be answered using more than one piece of evidence in the context or require external knowledge such as commonsense. This thesis aims at exploring and modeling structural semantics in various contexts, particularly with language models, in order to equip a QA system with a deeper understanding of context and improve the quality of answers to complex questions. In this dissertation, we discuss representation-based and model-based approaches to integrating semantic structures into deep neural networks and their application in answering complex questions in three challenging scenarios: multihop QA, book QA, and visual QA. As opposed to traditional QA tasks, these scenarios share the common challenge that the answer depends on sophisticated semantics in the textual and multimodal context. Specifically, multihop QA looks at an evidence chain that links the question to the true answer rather than a single piece of evidence. Book QA extends the evidence space to an extremely large extent, and evidence is intertwined and sometimes hierarchical. Visual QA is prominent for its multimodal nature as it reaches beyond a single evidence space and aims to find correlations within and across different modalities. To this end, we explore different types of semantic structures in each scenario and develop advanced approaches to integrating prior structural knowledge into deep neural networks. In particular, 1) we optimize dense representational learning to better model the dependencies among pieces of evidence when constructing a chain of reasoning for a multihop QA task; 2) we address a fundamental challenge in getting the system to understand long narrative contexts by first finding the event-centric nature in a story and correspondingly advancing the open-domain retrieval and reasoning technologies in order to comprehend the concepts and events, as well as their relationships, for a book QA task; 3) we investigate rich multimodal interactions and use question-led top-down structures to enhance model training to improve its ability to capture the most beneficial interactions related to a visual QA task.
May 2022
School of Engineering
Dept. of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.