Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorRadke, Richard J., 1974-
dc.contributorWang, Meng
dc.contributorStewart, Charles V.
dc.contributorTajer, Ali
dc.contributor.authorZheng, Meng
dc.date.accessioned2021-11-03T09:18:52Z
dc.date.available2021-11-03T09:18:52Z
dc.date.created2020-08-14T12:20:55Z
dc.date.issued2020-05
dc.identifier.urihttps://hdl.handle.net/20.500.13015/2542
dc.descriptionMay 2020
dc.descriptionSchool of Engineering
dc.description.abstractWhile we introduced novel attentive deep re-id networks for both image- and video-based re-id, all these networks require well-trained classifiers for retrieving meaningful localized regions and visual explanations, which may limit their performance and applicability in re-id tasks. Next, we propose to learn novel similarity functions for re-id and provide the first method to generate generic visual similarity explanations from learned similarity predictors with gradient-based attention. We demonstrate that our technique is agnostic to the specific similarity model type for re-id, e.g., we show applicability to Siamese, triplet, and quadruplet models. Furthermore, we make our proposed similarity attention a principled part of the learning process, resulting in a new paradigm for learning similarity functions. We demonstrate that our proposed similarity learning mechanism results in more generalizable, as well as explainable, similarity re-id models. Finally, we demonstrate the generality of our framework by achieving state-of-the-art accuracy performance on person re-id tasks, as well as competitive results on low-shot semantic segmentation.
dc.description.abstractBased on the proposed Consistent Attentive Siamese Network (CASN) with the 2D CNN architecture we designed for image-based re-id, we next extend the CASN to a novel 3D CNN network with spatiotemporal attention consistency to deal with video-based re-id tasks. While 2D CNNs have shown superior performance on image-based re-id, spatiotemporal feature learning and temporal aggregation of feature vectors remain key, unsolved problems for matching person video tracks. By introducing 3D convolution, spatial and temporal features are learned simultaneously for image sequences, at the same time producing 3D attention under the supervision of person identity labels. Attention consistency can then be performed along both spatial and temporal dimensions for invariant feature learning and effective feature aggregation. Experimental results on several benchmark video re-id datasets show that our proposed 3D CNN-based network achieves comparable performance to the recent state-of-the-art methods for video-based re-id tasks.
dc.description.abstractNext, we propose a novel deep Convolutional Neural Network (CNN) architecture for image-based person re-id, to deal with typical challenges such as viewpoint change, misalignment, and partially-cropped person images. We address these questions by means of a new attention-driven Siamese learning architecture, called the Consistent Attentive Siamese Network. Our key innovations compared to existing, competing methods include (a) a flexible framework design that produces attention with only identity labels as supervision, (b) explicit mechanisms to enforce attention consistency among images of the same person, and (c) a new Siamese framework that integrates attention and attention consistency, producing principled supervisory signals as well as the first mechanism that can explain the reasoning behind the Siamese framework's predictions. Experimental results on several benchmark re-id datasets show that our proposed algorithm achieves a new state-of-the-art in accuracy performance.
dc.description.abstractFinally, we conclude the thesis by considering the problem of trajectory recovery of persons of interest walking through a network of cameras. To date, the majority of person re-id research has been focused on the matching accuracy performance of query images taken from one camera view to candidate images from other camera views. On the other hand, in real-world surveillance systems, the users typically desire to consistently re-identify a person of interest through multiple camera views and spatially follow his/her moving path, which is difficult to achieve with traditional re-id algorithms. To this end, we present the first path recovery pipeline with transition time modeling and an effective pruning strategy, which is able to automatically reconstruct the trajectory of persons of interest moving in a network of cameras. We perform experiments on the RPIfield dataset, showing reconstructed trajectory results for 112 participants. We demonstrate the effectiveness of our proposed algorithm with respect to several metrics that consider different practical perspectives.
dc.description.abstractPerson re-identification (re-id) has broad appeal in practical applications such as video surveillance and criminal investigations. Given an image or image sequence of a person of interest from a “probe” camera view, person re-id aims to find the correct match from the candidates seen in a “gallery” camera view. Despite tremendous progress in re-id research, several problems still hinder the reliable, real-world use of person re-id. In this thesis, we study person re-id from different operational aspects, in order to design better real-world re-id systems.
dc.description.abstractIn typical re-id research, the probe image or image sequence is matched to a fixed gallery set, ignoring the arrival time of each person. In our first contribution, we designed and collected a new, large-scale, multi-camera re-id dataset called RPIfield. It preserves time-stamp information for every detected person in each video sequence, to better simulate real-world re-id operational scenarios. With this information about candidates' reappearances, re-id algorithms can be applied to provide instantaneous rank lists for probes that simulate a real-time re-id system. Based on this dataset, we propose a new evaluation methodology called the Rank Persistence Curve (RPC) for evaluating re-id algorithms in circumstances when the same person of interest can appear multiple times in the gallery, as well as when the performance over multiple persons of interest should be aggregated. We demonstrate the effectiveness of RPCs on RPIfield for allowing users to make informed choices about the expected performance of candidate re-id algorithms in real-world deployments.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectElectrical engineering
dc.titleDesign of real-world person re-identification systems
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid180120
dc.digitool.pid180121
dc.digitool.pid180122
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Electrical, Computer, and Systems Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record