Searching for people in camera networks
Loading...
Authors
Karanam, Srikrishna
Issue Date
2017-05
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer Systems engineering
Alternative Title
Abstract
First, we investigate the efficacy of learning discriminative dictionaries for re-id. To this end, we propose a method to learn a dictionary capable of discriminatively encoding features representing different people. We formulate the associated optimization problem such that the feature representations, with respect to the dictionary, of images of the same person are close whereas those of different people are relatively far. We present an efficient alternating directions optimization algorithm that results in closed-form updates in each iteration.
Finally, we conduct a large-scale performance evaluation of the current state of the art in person re-id, and develop a uniform evaluation protocol, resulting in a useful benchmark that informs the re-id research community of the current best performing feature extraction, metric learning, and multi-shot ranking algorithms.
Next, we explore end-to-end feature learning methods using convolutional neural networks for re-id and study their performance in comparison with traditionally used hand-crafted feature representation schemes in re-id. Furthermore, we investigate the efficacy of the learned features in the context of the several algorithms proposed in this dissertation.
We next propose a block sparse recovery approach, exploiting the observation that the feature vector of a particular image of a person in a certain camera view approximately lies in the linear span of the set of feature vectors of the same person in another camera view. By manually constructing a feature dictionary appropriately, the associated coefficient vector can be characterized as being block sparse, and can be estimated by solving a convex optimization problem.
Tracking people across multiple cameras is an extremely important task for security and surveillance. Typically, surveillance cameras are dispersed through a large area, and for practical purposes, are installed in a way that results in non-overlapping fields of view. The problem of tracking people requires matching people across such camera views. Specifically, the problem reduces to searching for a person of interest among the many people that appear in a certain camera view. In computer vision, this is typically studied as the person re-identification, or re-id, problem.
A conventional approach to address the person re-id problem is to assume the availability of a single image per person in each camera view and construct discriminative feature spaces in which the feature vectors of images belonging to the same person are close whereas those belonging to different people are relatively far. In this dissertation, we investigate the person re-id problem from a practical real-world perspective, assuming the availability of a track, or sequence, of images for each person in each camera view. Specifically, we develop techniques to address the re-id problem in this “multi-shot" scenario.
We next focus on the data description aspect of the multi-shot scenario. Specifically, we describe the available image sequence data using affine hulls and show how such a data description scheme can be used to improve existing metric learning algorithms that typically consider the average feature vector as a data exemplar to learn feature spaces. We also revisit our discriminative dictionary learning problem and propose a method that directly learns dictionary-based feature representations from affine hulls.
Finally, we conduct a large-scale performance evaluation of the current state of the art in person re-id, and develop a uniform evaluation protocol, resulting in a useful benchmark that informs the re-id research community of the current best performing feature extraction, metric learning, and multi-shot ranking algorithms.
Next, we explore end-to-end feature learning methods using convolutional neural networks for re-id and study their performance in comparison with traditionally used hand-crafted feature representation schemes in re-id. Furthermore, we investigate the efficacy of the learned features in the context of the several algorithms proposed in this dissertation.
We next propose a block sparse recovery approach, exploiting the observation that the feature vector of a particular image of a person in a certain camera view approximately lies in the linear span of the set of feature vectors of the same person in another camera view. By manually constructing a feature dictionary appropriately, the associated coefficient vector can be characterized as being block sparse, and can be estimated by solving a convex optimization problem.
Tracking people across multiple cameras is an extremely important task for security and surveillance. Typically, surveillance cameras are dispersed through a large area, and for practical purposes, are installed in a way that results in non-overlapping fields of view. The problem of tracking people requires matching people across such camera views. Specifically, the problem reduces to searching for a person of interest among the many people that appear in a certain camera view. In computer vision, this is typically studied as the person re-identification, or re-id, problem.
A conventional approach to address the person re-id problem is to assume the availability of a single image per person in each camera view and construct discriminative feature spaces in which the feature vectors of images belonging to the same person are close whereas those belonging to different people are relatively far. In this dissertation, we investigate the person re-id problem from a practical real-world perspective, assuming the availability of a track, or sequence, of images for each person in each camera view. Specifically, we develop techniques to address the re-id problem in this “multi-shot" scenario.
We next focus on the data description aspect of the multi-shot scenario. Specifically, we describe the available image sequence data using affine hulls and show how such a data description scheme can be used to improve existing metric learning algorithms that typically consider the average feature vector as a data exemplar to learn feature spaces. We also revisit our discriminative dictionary learning problem and propose a method that directly learns dictionary-based feature representations from affine hulls.
Description
May 2017
School of Engineering
School of Engineering
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY