Font Size: a A A

Cross-Modality Person Re-Identification In All-Day Surveillance Scenario

Posted on:2024-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y WeiFull Text:PDF
GTID:1528307340461734Subject:Multimedia Information Theory
Abstract/Summary:PDF Full Text Request
With the increasing attention paid by society to public safety,a large number of video surveillance cameras are deployed in public places such as shopping malls,communities and parks to monitor dangerous people and ensure the safety of pedestrians.However,faced with complex surveillance networks and massive amounts of surveillance data,it is difficult to quickly analyze and screen suspicious people solely on manpower.Therefore,computer vision and machine learning technologies have emerged to assist people in pedestrian retrieval and identification.As the face images captured by surveillance cameras usually have low resolution,it is difficult to directly use face recognition technology to identify criminals.It is necessary to determine pedestrian trajectory and conduct arrests based on the attributes of pedestrians such as body shape,clothing and posture.As a result,person re-identification technology has emerged,aiming to retrieve specific pedestrians present in images or videos with computer vision techniques.However,in low light conditions at night,cameras automatically switch to infrared camera mode to capture pedestrian images.Due to the lack of color information,infrared images are difficult to match with visible pedestrian images taken during the daytime.Therefore,how to use cross-modality person re-identification to accurately retrieve pedestrians in heterogeneous visible and infrared surveillance images is of great research significance for smart city security construction.The challenge of cross-modality person re-identification lies not only in the large crossmodality gap between heterogeneous images,but also in the image-level differences caused by illumination changes,viewpoint changes and human posture variations under different cameras of the same modality.Therefore,how to deeply mine the modality-invariant information of heterogeneous pedestrian images and map the features into a shared semantic space is a key problem to be solved for cross-modality person re-identification.This dissertation aims to solve the adverse effects brought by the modality misalignment problem at both image and feature levels,and proposes a series of new methods for cross-modality person re-identification.The main contributions of this dissertation are summarized as follows:1.A cross-modality person re-identification method based on reciprocal generative adversarial networks is proposed to improve the quality of the generated heterogeneous images and achieve image-level modality unification while reducing heterogeneous feature discrepancies.Existing heterogeneous image translation-based cross-modality person re-identification methods mostly utilize generative adversarial networks to unify images to the same modality and bridge the modality gap.However,these methods introduce noise during the image translation process,which affects the quality of the generated fake heterogeneous images.To address this problem,a joint loss that narrows the feature distribution of hidden space during mutual translation between visible and infrared images is proposed,allowing the generated images to retain pedestrian identity and the image style of real images simultaneously.The combination of the original images and the generated heterogeneous images are fed into attention-based discriminative feature extraction network to capture more discriminative features,and suppress the modality discrepancy.Experimental results show that the proposed method can achieve modality unification and improve the accuracy of cross-modality person re-identification.2.A cross-modality person re-identification method based on syncretic modality collaborative learning is proposed to build a three-modality shared semantic feature space,thus improving the accuracy of cross-modality person retrieval.The key challenge of cross-modality person re-identification is to map pedestrian images of different modalities to the same highdimensional feature space.Existing methods use generated images of the third-modality to assist in modality-shared representation learning.However,the feature distribution of the generated images is highly correlated with visible feature distribution and unrelated to infrared feature distribution,thus limiting the quality of the learned modality-shared features.To tackle this problem,a syncretic modality that preserves the characteristics of both infrared and visible images is proposed to assist in distribution similarity learning and challengeenhanced homogeneity learning.Moreover,incremental training is introduced to minimize the distance between the feature distribution centers of three modalities and gradually reduce modality differences.Extensive experiments prove that the proposed method effectively improves the performance of cross-modality person re-identification better than directly learning modality-shared features by modality fusion and progressive representation learning.3.A cross-modality person re-identification method based on a flexible body partition model is proposed to achieve adaptive partition of human body part through feature map response,and realize the alignment of part features.Existing part-based person re-identification methods usually adopt fixed templates or pose estimation model to partition the human body.Due to the influence of person pose and viewpoint variation,the ability of part feature matching is limited.To solve this problem,a feature response-driven adaptive body partition is proposed,which uses -means algorithm to cluster feature maps for unsupervised part segmentation.Hence,fine-grained feature learning is conducted by extracting deep semantic features which represent different parts of the human body.In addition,to learn more discriminative identity representation,an adaptive weighting strategy is introduced to assign different weights to each part based on its importance in representation learning.Finally,cross-modality adversarial learning is performed to promote the network to capture modality-invariant representations and map heterogeneous features to the shared space.Experimental results show that the proposed method can achieve modality alignment in the feature space,thus improving the retrieval performance.4.A cross-modality person re-identification method based on a dual adversarial representation disentanglement model is proposed,which separates modal-specific and modal-shared features to enhance the robustness of the model against identity-irrelevant features such as color and background.Existing methods typically utilize two-stream parameter-independent networks and parameter-sharing network layers to capture modality-specific and modalityshared features,respectively.However,they discard modality-specific features during the learning process,which can help the network learn more effective modality-shared features.Therefore,to better learn modality-invariant features,this paper separates modalityshared and modality-specific representations through dual-adversarial methods,i.e.,imagelevel channel exchange and feature-level magnitude change.In addition,to avoid excessive changes in modal-specific features that make the model difficult to converge,a bi-constrained noise alleviation loss is introduced to control the difference in feature distribution before and after feature-level magnitude change.Experimental results show that the proposed method can overcome the influence of background and illumination,and has discrimination ability for different identities,thereby improving the performance of cross-modality person reidentification.
Keywords/Search Tags:Cross-modality person re-identification, heterogeneous image translation, body partition, representation disentanglement, modality-shared feature learning
PDF Full Text Request
Related items