Font Size: a A A

Research On Person Re-Identification Based On Feature Prediction And Fusion

Posted on:2024-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H YinFull Text:PDF
GTID:1528306944956869Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid advancement of multimedia technology,video surveillance has become increasingly prevalent in public spaces,leading to person re-identifi cation becoming a prominent research topic in computer vision.The goal of person re-identification is to accurately identify and retrieve person images from different camera.Among the challenges encountered in complex real-world scenarios,there are four key issues in person re-identification:improving the performance of re-identification systems on unlabeled data,effectively utilizing camera information to learn view-invariant person visual representations,addressing the incompatibility between contrastive learning systems and unsupervised clustering algorithms on unlabeled data,and exploiting modal-specific data distribution characteristics to enhance the accuracy of cross-modal person recognition systems.This thesis addresses the four key issues in person re-identification by focusing on feature prediction and feature fusion,with a particular emphasis on self-supervised mask prediction,multi-view contrastive learning,multi-level feature fusion,and multi-modal feature fusion.Feature prediction involves extracting effective feature representations from raw data using model algorithms to better capture the intrinsic structure and relationships of the data.Firstly,this thesis provides a detailed exploration of feature prediction methods,including mask prediction methods for unlabelled data and contrastive prediction methods for multi-view person images.Feature prediction forms the foundation for feature fusion,which aims to effectively combine multiple features to enhance the performance of the final model.Secondly,this thesis delves into the principles and methods of feature fusion,including person feature fusion algorithms based on multi-level real-time contrastive learning and multimodal fusion algorithms based on similarity inference and prototype learning.Lastly,numerous experimental results demonstrate that the proposed methods effectively enhance the completeness and robustness of the features,as well as the efficiency and scene adaptability of the algorithms.The main innovations and contributions of this thesis can be summarized as follows.·We propose a novel method for person feature learning that utilizes clustering and mask prediction to effectively leverage unlabelled data.Current unsupervised person re-identification algorithms often rely on low-level local information in the feature space,neglecting the importance of global information.To address this issue,our method is based on mask prediction and combines spatial mask operations with offline temporal networks to generate spatio-temporal mask featuresand enhances the information consistency of different masked features in both visual and temporal dimensions through consistency learning.Furthermore,to ensure the completeness of learned features,we use the same supervised information for different mask features in the classification task,which enhances the visual consistency between different features.Finally,our approach facilitates unsupervised clustering to obtain more complete features and achieve semantic-level clustering,resulting in a person recognition network that focuses on semantic-level global information.·We propose a person re-identification method based on multi-view contrastive prediction to address data distribution differences between cameras.Given significant data style differences under different camera views,we leverage kernel density estimation to predict image encoding from different viewpoints.This prediction process facilitates model training and enhances the model’s robustness to viewpoint changes.To enhance the model’s discriminative ability for different person identities,we introduce a contrastive learning mechanism based on positive and negative samples during the multi-view encoding prediction process.This mechanism increases image feature differences between persons while maintaining the compactness of identity features for the same person.Experimental results show that the multi-view contrastive prediction method proposed in this thesis can effectively improve the recognition accuracy of person reidentification models on multiple public datasets.·To address the incompatibility between existing person re-identification systems and clustering algorithms,we propose a person feature fusion algorithm that leverages multi-level real-time comparative learning.In this thesis,we take into account the principle of density reachability of unsupervised clustering algorithms and introduce a real-time memory update strategy to ensure the retention of the original data feature distribution and enhance the clustering capability of unsupervised clustering algorithms.Additionally,we randomly select real-time features as category agents to further.optimize the feature distribution.To improve the efficiency and recognition capability of the proposed algorithm,we propose two realtime comparative learning methods to effectively combine multiple feature sources.Experimental results demonstrate that our framework improves recognition performance for both unsupervised and domain adaptation tasks.·We propose a multimodal fusion algorithm that utilizes similarity inference and prototype learning to reduce differences in data distribution across different modalities.Specifically,we construct a feature similarity matrix to facilitate positive and negative sample mining between different modalities based on overlapping person identity features in multimodal datasets.We then use this matrix to enable similarity inference for feature fusion of different modal data.To further exploit multi-grain feature information within modalities,we implement multi-grain prototype learning based on modal shared features to enhance the adaptive capabilities of the model in cross-modal scenarios.Experimental results demonstrate the effectiveness of our proposed method for improving model recognition accuracy in cross-modal image retrieval tasks.This thesis focuses on studying the problem of person re-identification in complex scenes,with an emphasis on addressing the following challenges:effectively utilizing unlabelled samples,learning perspective-invariant visual features,reconciling the incompatibility of contrast learning systems with unsupervised clustering algorithms,and extending the person recognition system to multimodal scenarios.To address these challenges,we propose a novel approach that incorporates multi-viewpoint learning and real-time feature updating into the contrast learning framework,leveraging camera information and clustering algorithm properties.Specifically,we explore the completeness of feature learning in unlabelled data and demonstrate the effectiveness of our approach through experimental results.Furthermore,we extend our approach to multimodal scenarios,providing theoretical references and practical implications for person re-identification in realistic settings.
Keywords/Search Tags:Person re-identification, Contrastive learning, Mask prediction, Multi-view consistency, Multimodal fusion
PDF Full Text Request
Related items