A knowledge augmented probabilistic framework for 3d human reconstruction from monocular cameras
Loading...
Authors
Zhang, Yufei
Issue Date
2024-12
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Computer and systems engineering
Alternative Title
Abstract
Monocular 3D human reconstruction aims to recover human body and hand configurations from images captured by a single RGB camera. It is an important topic in computer vision with broad applications in human behavior analysis, robotics, and many other fields. Recently, deep learning has enabled promising 3D human reconstruction performance, even in real-time. Nonetheless, existing methods suffer from several limitations. First, the deep 3D human reconstruction models are data-driven; their performance highly depends on the quality and quantity of training data. Second, being deterministic, they typically cannot capture uncertainties in input data and learned models, and hence cannot accurately quantify their reconstruction accuracy. Finally, most approaches perform frame-based static reconstruction, ignoring underlying human dynamics. To address these limitations, in this thesis, we develop a knowledge-driven probabilistic approach for data-efficient, generalizable, and robust 3D human reconstruction from monocular cameras. To improve data efficiency and generalization performance, we propose integrating prior knowledge into model training. Instead of learning prior knowledge from Motion Capture (MoCap) data, we systematically exploit different types of well-established generic knowledge that govern the behaviors and properties of human bodies and hands. Specifically, we exploit spatial body and hand knowledge, including anatomy, biomechanics, and physics. We effectively incorporate this knowledge through differentiable training loss functions, allowing for 3D human reconstruction without any 3D annotations. Moreover, to effectively account for uncertainties in both input data and trained models, we introduce novel probabilistic frameworks. We study the impacts of uncertainties on reconstruction accuracy and further leverage the captured uncertainties to improve 3D reconstruction accuracy and robustness. Finally, we exploit temporal physics knowledge to further improve 3D human reconstruction from monocular videos. For 3D body and motion recovery, we leverage theoretical dynamics and develop a new physics-based reconstruction model that produces not only body reconstruction but also physically plausible body motion and joint force estimates. For hand motion reconstruction, we introduce a novel hand motion refinement framework based on combining a diffusion model with intuitive hand dynamics. This framework produces physically plausible hand motion and is robust to degraded image observations.
Description
December 2024
School of Engineering
School of Engineering
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY