| Multi-person pose estimation can be divided into two phases.In the first phase,human are detected in a wild image(human detection).In the second phase,based on the output human image patches,the keypoints are located in each human image patch(single-human pose estimation).Multi-person pose estimation are with a wide range of applications in the fields of human-machine interaction,film production and security surveillance etc.In view of the developing tendency,lightweight is an important branch of deep learning based multi-person pose estimation.This thesis made series of explorations and researches on how to lightweight the multi-person pose estimation model.The main works can be surveyed as follows:With respect to the single-human pose estimation,based on the most popular HRNet,a lightweight and efficient model is constructed.In the proposed model,firstly,a lightweight module H-Blocks is proposed.H-Blocks incorporates depth separable convolutions to lightweight the HRNet,and incorporates the attention mechanism and Mish activation function to promote the accuracy.Secondly,a new upsampling method is proposed which uses pixel shuffle to reduce the loss of features.Thirdly,a method of concatenation is proposed to optimize the output structure of the HRNet.Fourthly,H-random erasing is proposed to enhance the data,which uses small-scale erasure to better match the keypoint positioning tasks in human.Fifthly,a keypoint decoding method based on Taylor’s formula is proposed to reduce the quantization error in the human pose estimation.To evaluate the proposed techniques,series of experiments on the world-renowned MPII human pose estimation dataset and COCO human pose estimation dataset are designed and conducted.The experimental results demonstrate the efficiency and effectiveness of the proposed method.With the same accuracy,the parameter and calculation amount of the baseline model are reduced by 62% and 40% respectively,which is significantly better than other lightweight models.With respect to human detection,an YOLOv4-Tiny lightweight model is proposed with two improvements,the S-Blocks and the dual attention mechanism subnet.The S-Blocks for feature extraction further reduces the computational complexity of the YOLOv4-Tiny and increases the speed of inference.Specifically,S-Blocks draws on the idea of Inception and uses point convolution to reduce the computational complexity.The dual attention mechanism subnet is proposed to strengthen the extraction of key information from the backbone network,which is composed of spatial attention mechanism and channel attention mechanism.To evaluate the proposed techniques,series of experiments are designed and conducted on the world-renowned VOC detection dataset and COCO detection dataset.The experimental results show that given that the speed of the proposed method increases by 6%,the accuracy is improved by 1.9% in m AP compared with the benchmark network YOLOv4-Tiny.Based on the above proposed single-person pose estimation and human detection methods,this thesis verified the multi-person pose estimation overall model on the COCO dataset,and the average accuracy(AP)of the model reaches 74.2%. |