Dynamic probabilistic models for efficient and generalizable human action recognition

Loading...
Thumbnail Image
Authors
Guo, Hongji
Issue Date
2024-12
Type
Electronic thesis
Thesis
Language
en_US
Keywords
Electrical engineering
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
Human action recognition (HAR) focuses on identifying human actions in video data. It has numerous important applications such as surveillance and smart homes. Despite its significance, HAR is challenging due to the complex dynamic patterns associated with human actions, the presence of redundant and irrelevant data in input, the high computational efficiency requirements, and the necessity to recognize both known and unknown human actions. This thesis aims to combine state-of-the-art dynamic deep models with explicit uncertainty modeling to achieve accurate, robust, and efficient HAR. First, to address the complex dynamic patterns inherent in human actions, we integrate transformer models with uncertainty modeling for complex HAR. Using the self-attention mechanism of the transformer, our model captures intricate dependencies among atomic actions. Additionally, we extend the transformer into a probabilistic transformer by treating the attention scores as random variables to capture both data and model uncertainties. Evaluations of our model on benchmarks demonstrate its superiority over existing methods. Second, to mitigate the impact of the redundant and irrelevant data on HAR, we propose an uncertainty-based spatial-temporal attention mechanism for continuous action recognition. By explicitly modeling the mutual information between the predicted labels and features using uncertainty, we generate attention masks that enable the model to prioritize high-MI features, while disregarding redundant and irrelevant ones. Evaluations on continuous action recogntion benchmarks demonstrate the effectiveness of our approach. Third, to achieve both real-time HAR along with accurate uncertainty quantification (UQ), we introduce a Bayesian Evidential Deep Learning (BEDL) framework. By combining Bayesian and evidential deep learning, BEDL employs a knowledge distillation procedure to transfer accurate UQ from a Bayesian model to a deep evidential model, which performs fast inference and precise UQ. Experiments demonstrate the accuracy and efficiency of BEDL. Finally, we tackle the challenge of recognizing both known and unknown human actions. We propose a Bayesian open-setHAR method that utilizes a deep ensemble to estimate epistemic uncertainty, enabling the distinction between known and unknown actions. Additionally, we employ optical flow to guide the model's focus on high-motion regions. Our evaluations on benchmarks reveal that our method achieves superior performance.
Description
December 2024
School of Engineering
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN