As an important research content of computer vision,pedestrian detection based on deep learning is widely used in the fields of intelligent monitoring,security,traffic control,intelligent assisted driving and virtual reality.Problems such as occlusion between spaces and mixed backgrounds lead to insufficient expression of model features and poor detection results.With the advancement of technology,computer vision models tend to develop multi-tasks.The model has the advantages of high efficiency and few parameters while coupling multiple tasks.Based on the multi-task learning model,this paper solves the problems of complex and changeable human body poses and background confusion,improves the model detection ability,and designs a method on this basis to solve key problems such as mutual occlusion and small targets in dense scenes.The main work is as follows:1)For the problems of complex background and changeable human body posture in pedestrian detection,Center Net was selected as the model to design the multi-task alignment network.Since Center Net did not consider the differences between different tasks from the feature level,a separation module was added on the basis of the network to decompose the original features,separate the features focusing on their own tasks,avoid the feature conflicts caused by the original features,improve the model’s ability to extract key features,and optimize the pedestrian detection effect under the complex background.In order to alleviate the deviation between attitude and detection tasks,the product of detection and attitude prediction scores was used as the alignment measurement to quantify the alignment degree between tasks.Meanwhile,weighted weights were used to balance various tasks,so as to eliminate the contradiction between various tasks and improve the accuracy of target detection and attitude key point detection.The multi-task alignment network was verified based on the Crowd Pose data set.Compared with the original model,the AP value and AR value increased by 10.1% and 2.0%,respectively,and the pose estimation AP value and AR value increased by 8.2% and 2.6%,respectively.2)Aiming at pedestrian occlusion and small targets in dense scenes,a multi-task interactive network is designed based on the multi-task alignment network.The deep aggregation network was used to improve the feature extraction ability of the model.In order to avoid the influence of deformed objects on the detection effect,deformable convolution was introduced to realize the accurate mapping between different layers in the process of convolution,so as to enhance the modeling ability of deformed objects.In order to promote the information transfer between tasks,the global features and attitude features of pedestrians were input into the interactive network,and the consistency constraint was used to construct a loss function,so as to facilitate the model to learn the common information of the global and attitude of pedestrians,so as to alleviate the deviation between target detection and attitude recognition tasks and improve the detection accuracy of the model.In the training stage,the small target is copied and pasted to increase the number of positive samples of the small target and the frequency of occurrence in the image,so that the model can strengthen the learning of the small target.Based on the Crowd Pose data set,the multi-task interactive network was verified.Compared with the original target detection,the AP value and AR value were increased by 15.6% and 6.5%respectively,and the pose estimation AP value and AR value were increased by 21.8% and11.6% respectively. |