Author
Swears, Eran
Other Contributors
Boyer, Kim L.; Radke, Richard J., 1974-; Stewart, Charles V.; Sanderson, A. C. (Arthur C.); Hoogs, Anthony J.;
Date Issued
2015-05
Subject
Computer Systems engineering
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
Experiments are performed on low-altitude surveillance video as well as high-altitude surveillance imagery over a large area of interest that shows how our functional scene element recognition approaches significantly improve performance, particularly when compared to other relevant methods. Our complex activity classification and detection experiments on surveillance and sports video show that our graphical models improve performs over the most relevant models and that having a diverse set of relational constraints is essential to good detection performance.; To accomplish this we introduce two complementary lines of research. Our first line of research introduces functional scene element recognition to the outdoor surveillance domain, where traditional indoor approaches can not be used because of the large number of movers, tracking errors, and clutter. Functional scene elements are defined based on their specific function or purpose through the interactions that agents have with them, rather than based on their appearance, which can be ambiguous. To recognize simple scene elements (e.g. roadways, sidewalks, doorways) we develop weak activity detectors and specialized kernels based on how moving objects typically interact with scene elements. We then extend this work to recognize both simple and complex scene elements (e.g. roadway, intersection, cross walk, buildings) that can have few interactions, indirect evidence, and multiple types of behaviors for a single scene element type. We develop a Pyramid Coding algorithm to capture the multiple modes of behaviors that uses all layers of a hierarchical divisive clustering algorithm for characterization and then a local context window to accumulate evidence over time and to incorporate nearby evidence.; Our second line of research explicitly models the relational context between moving objects to address the complex activity classification and detection problem while incorporating scene context for the detection problem. Complex activities consist of a collection of moving objects that are interacting in a complex manner to perform a specific task (e.g. Vehicle-Dropping-Off-Person). Classifying complex activities is a matter of assigning class labels to a group of pre-segmented evidence, while the much more challenging detection problem requires the algorithm to spatially and temporally find the evidence and then group them into the activity of interest. Many complex activity models can address the classification problem for distinct sets of activity types, but they do not perform as well when discriminating similar types of complex activities. To accomplish this we introduce a discriminative structure learning algorithm for graphical models that improves classification among activities that have similar types of evidence and interactions. We then extend this work to the detection problem where we introduce a hierarchical inference framework to make inference with the complex graphical model more practical. We also develop a relational probabilistic graphical model to filter out many false detections using a wide variety of relational context that are implemented as soft constraints between pairs of moving objects.; Many scenes in sports, surveillance, and other video domains involve both simple activities and complex multi-agent activities where the moving agents (e.g. pedestrians, vehicle) interact with their surroundings and each other. This dissertation examines how to characterize these interactions by incorporating relational context into models that will enable the automatic recognition of scene elements (e.g. buildings, roadways) and complex activities (e.g. Person-Unload-Vehicle) to reduce the burden on video analysts. Relational context can be anything from known relationships between moving and stationary objects such as a vehicles driving on roadways, to the more explicit self constraints (e.g. "Probability(vehicle)") or pairwise semantic constraints (e.g. "Near","Before", "same-direction", "Track-IDs-Different") between two moving objects (e.g. person, vehicle).;
Description
May 2015; School of Engineering
Department
Dept. of Electrical, Computer, and Systems Engineering;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;