Font Size: a A A

Invariant Feature Learning In Person Re-Identification

Posted on:2022-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q WanFull Text:PDF
GTID:1488306323482074Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Person Re-Identification(ReID)aims to identify pedestrians across different surveillance cameras within a limited range of space and time.The key to person ReID lies in how to deal with the rich variances in pedestrian samples and extract invariant feature representations to describe the identity.The variances can be roughly divided into three aspects:the first one is human variances due to the walking movement and the viewing angle of the surveillance camera.Another one is style variances,resulted from the specific environment condition and the bias of the surveillance camera.The final one is noise variances,caused by the disrepaired cameras or man-made malicious attacks.The existing invariant feature learning methods mainly start from the entire image,and the following three problems are likely to occur.(1)It is easy to ignore the human variances,resulting in the loss of local details.(2)It is difficult to cope with unknown image style,limiting the ability of model to be applied to new surveillance scenarios.(3)It is easily affected by noise,especially designed adversarial perturbations,leading to recognition errors.Therefore,how to deal with the complex and diverse changes in pedestrian images is an urgent problem to be solved in person ReID.Therefore,this paper specifically analyzes the characteristics of the three variances,and designs new invariant feature learning methods,including human local learning for human variances,network decoupling learning for style variances,and manifold space learning for noise variances.For human local learning,although there have been tar-geted researches in current work,it is difficult for the existing methods to obtain accurate body parts and fully expressed representation.This leads to large intra-class differences due to unaligned body parts,and smaller inter-class distance due to insufficient feature expression.Therefore,this paper emphasizes the importance of localization and de-scription for human local learning.An attention mechanism module with concentration constraints and discriminative constraints is designed,and feature descriptors based on statistics,location,and relation are proposed.It greatly improves the stability of the algorithm,and can extract the local details of pedestrians that retain invariance.Regarding the style variances,most of the current work is based on the entire pedes-trian image to dig out the unknown image style in the target scenario.These methods rely on the existing pedestrian image data and models for transfer learning.However,due to the particularity of the target style,the prior knowledge cannot be directly ap-plied to the unknown scenario.This paper argue that the pedestrian in image is similar among scenarios,so the feature extraction can be shared.On the contrary,the image style is specific,so its feature can only be obtained based on the target scenario.For this purpose,this paper proposes network decoupling learning,where different modules take charge of extracting features for the shareds and specifics.This method makes full use of the existing prior knowledge and mines the unique content of the target scene as much as possible.It can achieve a higher pedestrian re-recognition effect in any target scenario,and the performance is very close to the methods using annotations.With regard to the noise variances,little work studies this demand,which limits the application and expansion of ReID in real life.Therefore,this paper proposes mani-fold space learning based on the assumption that "noise causes samples to deviate from the data manifold".The so-called manifold space,that is,real-world high-dimensional data is actually distributed on a low-dimensional manifold.In other words,the data on the manifold is all noise-free data,and noise makes the data break away from the man-ifold and out of the data distribution range of model training and learning.Therefore,this paper proposes to estimate the manifold space of real data,and defines a manifold projection operation to reproject the noisy data back onto the manifold.In this way,a connection between noise data and real noise-free data is constructed,improving the stability of the model against noise,and obtaining invariant features.The main contributions are summarized into the following three points:·For human variances,we emphasize the importance of discovering and descrip-tion of local parts in human local learning,and propose constrained attention mechanism modules,as well as rich feature descriptors to achieve the goals.This algorithm improves the effectiveness of the person re-identification algorithm;·For style variances,we emphasize the importance of the spatial-temporal relation-ship,and for the first time propose a network decoupling method to learn features of pedestrian and background separately.Without the use of data annotation,the performance is very close to that using data annotation.·For noise variances,we first proposed the manifold blocks and the manifold net-works.By modeling the manifold space of features,and introducing manifold projection to eliminate noise interference,the pedestrian re-identification algo-rithm model is greatly improved against noise.
Keywords/Search Tags:Person re-identification, invariant feature learning, constrained attention mechanism, network decoupling, network structure, manifold learning
PDF Full Text Request
Related items