Font Size: a A A

Research On Human Pose Estimation Algorithm Based On Convolutional Neural Network

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z H MaiFull Text:PDF
GTID:2438330611454122Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is a technique for the key points of the head,shoulder,elbow,wrist,hip joint and knee in the specified position picture,which can be used in the fields of humancomputer interaction,motion analysis and motion recognition.Compared with the traditional attitude estimation algorithm,the convolutional neural network method has made a breakthrough in human attitude estimation,which greatly improves the accuracy and generalization of attitude estimation.In modern attitude estimation,convolutional neural network(convolutional neural network)is widely used to locate the key points of human body in the way of heat map regression.The main contributions of the paper are as follows:(1)An attitude estimation algorithm for human body is proposed in this paper.At present,key points are detected by regressing a series of heat map which is processed by a convolutional neural network(CNN)model.However,the supervision information is a series of single-scale heat map without matching the multi-scale key points of ground truth while training the CNN model.This type of monitoring makes it more likely that the predicted key points will deviate from the real position.In order to improve the prediction accuracy,this paper proposes a novel model named human pose estimation via multi-scale intermediate supervision convolution network.First of all,a person is cropped individually from a complete picture and resized properly as the input samples of the CNN model.The input samples will be rotated,scaled and flipped randomly to expand the training set,which enhanced the generalization of the model.And then,In the heat map generation,the image pyramid principle in the traditional digital image processing is used for reference.We generate the heat map on three scale according to the input samples.The scale of heat maps is determined by the standard deviation of twodimensional Gaussion distribution.Our residual network model is composed of three stage using Res Net50 as the backbone network.Each stage included a Res Net50 and three deconvolution layers.Each Res Net50 consists of 16 residuals modules,wherein each residuals module contains a convolved layer in series with a batch normalization layer and an activation layer.The resnet50 is used to extract high-dimensional features,the size of the feature graph becomes smaller and the dimension becomes more.Then,the deconvolution layer reinstates the high-dimensional features to the same size and dimension as the heat map.The whole process is similar to the coder and decoder.The heat maps output of Res Net50 in the first,second and third stages corresponds to the heat maps annotation of large,medium and small sizes respectively,and the intermediate supervision was realized twice in the output of the first and second stages.In the test phase,the output heat maps from the last stage are used for calculating the final key points coordinates by non-maximum suppression.To demonstrate effectiveness of our network,we train and test on tow benchmark dataset: the keypoint detection subset of COCO(Common Objects in Context)dataset and MPII Human Pose dataset.The COCO dataset contains 200,000 images and the MPII dataset contains 25,000 images.The test result of PCK@0.1(Percentage of Correct Keypoints)reached 37.2% on the MPII validation data set with 2958 picture,which was 2.1% higher than other methods.The test result of PCKh reached 89.94%.The results of the m AP test on the COCO validation dataset reached 75.5%,an increase of 1.2% compared with other methods.It also gets 0.5%?1.5% increase in the remaining items.The results indicate that the multi-scale relay supervised convolutional network model proposed in this paper can reduce the influence of the non-correspondence between the size of key points and the size of heatmap ground truth in human pose estimation,thus improving the accuracy and achieving better performance when the evaluation criteria are stricter..(2)An efficient human posture convolution network is proposed.Existing method for estimating the body posture tend to consider how to improve the generalization performance of the models,will usually increase the parameters of the model to be a bigger model,a big increase in the size of the network model,it does bring the improvement of the accuracy,while ignoring the remarkable efficiency,accuracy of ascension brings significantly increase the amount of calculation and redundancy of parameters,makes the model of longer operation time and efficiency is not so high.In this paper,the efficiency of pose estimation is improved by using Efficientnet as the backbone network of attitude estimation model.Two sub-networks for processing feature output and regression heat map are studied respectively,and two models,M0 and M0*,are obtained,which further improve the efficiency of the model.Study the attitude estimation model of the benchmark,respectively in the depth and width of the network,three dimensions to increase the efficiency of the resolution,continue to increase the single dimension will rapidly maturing network model,the accuracy,reduce the profitability of the network,this paper use the method of compound expanding,expanding at the same time,in three of the network model after each increase the computational complexity of the model increased by 1.5 to 2 times.In the case of the same accuracy,the number of parameters and calculation amount of the method in this paper are far less than those in recent years.
Keywords/Search Tags:human pose estimation(HPE), multi-scale, intermediate supervision, residual network, Efficient network, Depth, Width, Resolution
PDF Full Text Request
Related items