| In recent years,pedestrian attribute recognition is an important research topic in the field of computer vision.Pedestrian attribute recognition aims at mining a series of semantic attributes of a given target character,such as gender,age,clothing style or other appearance attributes,to help locate a specific pedestrian target through discrete and precise attributes.It can provide detailed soft biometric recognition and important semantic information for pedestrian re-identification,pedestrian retrieval,intelligent surveillance analysis and other fields.In practice,pedestrian images are disturbed by many complex factors in the process of shooting and processing,such as illumination,occlusion,unbalanced attribute distribution,low resolution image quality,etc.To improve the performance of attribute recognition,it is very important to explore the spatial and semantic relations of attributes.Therefore,how to extract different visual semantic features from an input image to complete the corresponding attribute discrimination and improve the robustness of attribute discrimination is the difficulty of pedestrian attribute recognition.To overcome the shortcomings of existing pedestrian attribute recognition algorithms,two pedestrian attribute recognition network models based on deep learning are established in this paper.In order to solve the problems of low recognition accuracy of complex samples and unbalanced distribution of attributes in attribute dataset,a pedestrian attribute recognition network based on residual attention is built in this paper.The Resnet50 network is used as the backbone to extract pedestrian attribute features with semantic information.On this basis,the residual attention network structure of attribute categories is adopted to focus on the key core areas where attributes exist and analyze the internal relations between different attribute categories.The normalization and asymmetric weighted loss strategies are used at the end of the network model to reduce the impact of unbalanced distribution of pedestrian attribute samples and accelerate the convergence speed of the model.Finally,several rounds of training and testing are carried out on PETA dataset and PA100 K dataset,which are comonly adopted in pedestrian attribute recognition,and the experimental results verify that the proposed method can improve the ability of pedestrian attribute recognition.Although the convolutional neural network performs well in extracting local features and rich geometric information of pedestrians,it has difficulty in capturing the relationship between pedestrian attributes and global attribute clues,the recognition accuracy of these complex attributes is low,which may cause the loss of attribute details.To solve these problems,a swin transformer-based network model is proposed in this paper for pedestrian attribute recognition.The model includes three parts: swin transformer encoding module,decoding module and attribute prediction module.The swin transformer encoding module is adopted as a feature extractor to adaptively learn the spatial interaction relationship between the appearance of the pedestrian image and the body region.Meanwhile,the transformer decoding layer module further extracts the attribute features and enhances the feature representation to model the complex semantic interaction relationship between various attributes.Finally,two prediction heads are used to predict the pedestrian attributes.The model is experimented on PETA dataset,PA100 K dataset,RAP1 and RAP2 datasets,and the results indicate that the recognition methods in this paper perform better than other models. |