Font Size: a A A

Cross-modal Person Re-identification Based On Deep Learning

Posted on:2024-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z HanFull Text:PDF
GTID:2568307094481144Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Person re-identification refers to the matching and recognition of images captured by the same pedestrian at different times or different cameras in a monitoring scene.Its main purpose is to accurately match and recognize the same pedestrian in different images through computer vision technology,achieving applications such as pedestrian tracking and behavior analysis.In recent years,the explosive development of deep learning technology in various industries has made significant breakthroughs in person re-identification technology.However,in more diverse and open scenarios,the challenge of person re-identification remains severe.For example,cross modal person re-identification has visual appearance differences in images under the same mode(such as posture differences,background differences,and perspective differences),and images under different modes have contrast differences,resolution differences,and noise samples(such as occlusion and misalignment),making it difficult to meet the requirements of real-world applications,Therefore,this article applies deep learning methods to improve the person re-identification model,and the specific work content is as follows:The existing methods for person re-identification assume that the image is roughly aligned,and the learning of character image representation is relatively rough.In order to further improve the model’s ability to learn character representation,a feature alignment module is introduced.The alignment module utilizes dense correspondence between cross modal character images to suppress modal related features in character representation,making the learned character features more discriminative and promoting character representation learning.At the same time,based on the original loss function,the heterogeneous center triple loss is introduced to compare the center of each person with the anchor point to reduce the adverse impact of false detection and improve the performance of the model.The experimental m AP on two public datasets,Reg DB and SYSU-MM01,achieved 73.79% and 67.49% respectively,indicating that the model can improve the performance of cross modal person re-identification.In order to further improve the accuracy of the model,attention mechanism is added to the Rsenet-50 backbone network,and its activation function is optimized.Adopting an ECA+FRe LU structure,where ECA is a lightweight channel attention module,using only k(k≤9)parameters can bring significant performance improvement.The ECA module does not reduce computational complexity through dimensionality reduction,but instead utilizes fast one-dimensional convolution and adaptive channel dimension functions to effectively achieve local cross channel interaction.FRe LU is a new type of funnel activation function.By introducing spatial conditions,it extends Re LU and PRe LU to 2D activation,and realizes pixel level modeling capabilities,so that the network can obtain more complex visual layout.Using the ECA+FRe LU model structure,the experimental m AP on two public datasets,Reg DB and SYSU-MM01,improved by 1% and 1.45%,respectively,improving the accuracy of person re-identification tasks.
Keywords/Search Tags:Deep learning, Cross modal person re-identification, Alignment learning, Heterogeneous center triplets, Channel attention
PDF Full Text Request
Related items