Font Size: a A A

Algorithms Research For Single-modal And Cross-modal Person Re-identification

Posted on:2024-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q W TangFull Text:PDF
GTID:2558307094479484Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
The purpose of the person re-identification task is to determine whether pedestrians captured by different cameras belong to the same pedestrian.It is usually considered an image retrieval problem and is commonly used for cross-camera pedestrian tracking and surveillance.It is a challenging task involving various fields such as computer vision,pattern recognition,and deep learning.Currently,most of the person re-identification methods focus on learning discriminative and robust features,however,in real-world scenes,there may be interference from the similar dress,light intensity,pose viewpoint changes,and occlusion,which greatly increase the difficulty of the person reidentification task.In addition,most of the current methods use large backbone networks for feature extraction.However,these large backbone networks have a huge number of parameters,slow operation,and create many problems when embedded in small camera devices,so it is necessary to construct lightweight models with a low number of parameters and low complexity.The task of single-modal person re-identification has made great progress in recent years and has achieved high performance.However,single-modal person reidentification models have many problems when deployed in real-world scenarios,such as a large number of crimes or related events that usually take place at night or in dark scenes,which cannot be captured by ordinary cameras,but such events can be captured by advanced infrared cameras.Therefore,retrieving each other from the pedestrian images captured by visible and infrared cameras has become an important research direction.The main research of this paper is as follows:(1).For the limitations of current global and local feature-based methods and attention-based methods that ignore the potential features of pedestrians,this paper proposes a method to extract the potential features of pedestrians and thus improve the model’s performance.This paper uses a multi-branch attention module to extract finergrained local features of pedestrian images.In addition,to enhance the feature mining capability of the model,several non-local modules are used in different stages of the backbone network in this paper.To prevent the model from focusing too much on salient regions and ignoring potential information,this paper utilizes new saliency filtering and suppression operations to efficiently drive the model to extract potential and diverse pedestrian information.A new multi-stage global feature fusion module is also developed in this paper to fuse salient features from different stages and increase the diversity of features.Experimental results show that the improvements in this paper can greatly improve the model performance on top of the baseline.(2).For the current methods use large backbone models for feature extraction.However,these large backbone models often contain many parameters,but the embedded devices have limited computational resources and real-time computing speed.To overcome this problem,this paper proposes a new lightweight model for person reidentification,SCL-net,and reconstructs all the underlying modules of the model and proposes a new convolutional unit to construct dimensionally richer feature mappings by low-cost linear transformation and channel disruption operations.In addition,the channel attention and spatial attention modules are redesigned to make them more lightweight and adaptable to the Re-ID task.Experimental results show that the proposed lightweight model is more suitable for the person re-ID task than the mainstream lightweight networks(e.g.,Mobilenet).(3).For the current problems that the representation-based learning methods cannot effectively eliminate the modal differences between visible and infrared images and the generative adversarial network-based methods need to design a series of generation and discrimination modules with high hardware cost and high time cost,this paper designs a style loss function based on deep network features for eliminating the modal differences between visible and infrared images,and a shallow content loss function based on network features is designed to drive different branches of the model to learn similar semantic information of visible and infrared images.From the experimental results,the model in this paper achieves SOTA performance in the setting of visible images querying infrared images from the Reg DB dataset.Figure [34] table [23] reference [146]...
Keywords/Search Tags:Person re-identification, Deep learning, Local and global features, Lightweight networks, Cross-modal person re-identification networks
PDF Full Text Request
Related items