Towards generating coherent stories from image sequences: a computational approach using suspension of disbelief

Loading...
Thumbnail Image
Authors
Battad, Zev
Issue Date
2024-08
Type
Electronic thesis
Thesis
Language
en_US
Keywords
Cognitive science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
This dissertation explores a computational model for generating the connections needed to make a story from a sequence of images. Storytelling is an important cognitive task that humans use to communicate and organize information. Being able to automate storytelling would benefit interactive media such as video games, make it easier to automatically summarize large data sets in a way that people understand, and make AI systems more capable of explaining their actions. One storytelling task that humans perform is visual storytelling, the task of making a story from a sequence of images. To do so, humans establish recurring characters and props, connect actions in different images into a plot, and fit their plot to common emotional arcs, like tragedies. Humans performing visual storytelling reinterpret what actions they see in the images and interpret visually distinct objects in different images as being the same if doing so serves the story they want to make. Existing computational story generation systems do not show these human phenomena and cannot reinterpret the information in images for the sake of the story they aim to create. The aim of this thesis research is to create a computational system that makes the connections between images needed to make a story and can reinterpret the information in those images for the sake of that story. The system accepts the objects and their possible actions in a sequence of images as its input. It then establishes what objects are the same between images to make recurring characters and props, connects actions in different images into sequences to make a plot, and fits its plot to common emotional arcs. The system forms these connections by looking at what evidence supports them and creates different sets of connections depending on how it prioritizes different kinds of evidence. To evaluate the system, human participants were asked to perform the visual storytelling task, and the objects and actions they identified in the images were gathered as input for the system. By varying the system's parameters, diverse sets of connections were generated from the same sequences of images. These sets of connections were examined to see if the system was equating objects and sequencing actions for its plot in a way that was consistent with how the system was expected to form these connections given its different evidence priorities. The results demonstrate the system's ability to vary the extent to which it equates visually distinct objects for the sake of its story and to adapt its interpretation of actions based on its desired emotional story arc.
Description
August 2024
School of Humanities, Arts, and Social Sciences
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN