In the era of information explosion,people will inevitably think of extracting the main content from these information,and then hope to effectively understand and process the human activities in these data.Human pose estimation is one of the effective means.To estimate the human pose of the target person in the picture,it is necessary to detect all the key points.However,the detection task is full of challenges due to the different scales of the characters,the occlusion of the key points of the characters and the different pose of the human body.This paper takes COCO data set as the research object.In order to further improve the prediction accuracy,a human pose estimation algorithm based on transformer is designed and implemented.The key research work is as follows:(1)In the classical human pose estimation algorithm based on convolutional neural network(CNN),three classical algorithms of CPM,CPN and HRNet are analyzed,and the basic framework of network design is studied.Through the analysis and comparative experiments of classical algorithms,this paper explores the influence of expanding receptive field and designing multi-scale and multi-level network on the accuracy of the algorithm in the process of feature extraction based on CNN.On the basis of fully considering the advantages of the above algorithms,this paper analyzes the basic framework of the network introducing global features.(2)The human pose estimation algorithm based on the global self attention model transformer is studied.In order to reduce the amount of network calculation,the sparse optimization of the transformer network structure is carried out,and the human posture estimation algorithm based on sparse transformer is realized.Compared with CNN,the transformer structure can learn the global features in the image and obtain more accurate semantic information.Aiming at the problem of too much computation of transformer model,the largest k attention weights in the attention matrix are selected for sparse,which can better save computing resources and improve the operation efficiency of the algorithm.Experiments show that the effect of the algorithm based on transformer is obviously better than that based on CNN;And the algorithm after sparse optimization will not reduce the performance of the algorithm while reducing the amount of calculation.(3)In order to further improve the prediction accuracy,a high-resolution human pose estimation algorithm based on CNN and transformer is proposed.Considering the lack of context information of the feature map of the input sparse transformer,based on the algorithm in Chapter 3,the context information acquisition module is added,which is implemented by two ways: hierarchy based and hole convolution based,so that the obtained feature map can not only obtain sufficient context information,but also maintain high-resolution representation.The experimental results show that the improved network prediction accuracy is improved,and through experimental comparison,the improvement of the context information acquisition module based on hierarchy is greater,and the improved algorithm based on hole convolution has fewer parameters and higher efficiency. |