Font Size: a A A

Research On 2 Dimensional Human Pose Estimation Based On Light Weightneural Network

Posted on:2022-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2518306524985409Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
2D human pose estimation is a basic but challenging problem in computer vision.The purpose of pose estimation is to locate the coordinates of key joint points on the twodimensional plane of the human body(for example: head,shoulders,ankles,etc.).It has many applications,such as: behavior recognition,game entertainment,movie gesture capture,human re-recognition,and so on.Human body pose estimation is a subject that has been studied as early as the 1970 s,but human body pose estimation has been difficult to reach an applicable level.With the rise of large-scale data sets imagenet and convolutional neural network alexnet.Convolutional neural network and pose estimation are combined,and then pose estimation truly has a certain level of application.However,the current cutting-edge methods often require very wide and deep convolutional networks,which brings huge parameters and huge floating-point operations.Although these methods have a high accuracy rate,a major disadvantage is that such huge models are often very timeconsuming in reasoning,making these models difficult to deploy in mobile devices or other low-power embedded devices.The focus of this article is to design a lightweight human body posture estimation network.The amount of calculation and parameters of this network are relatively small,and the accuracy rate will not be greatly reduced.The main work of this paper is summarized as follows.(1)Designed various levels of human body pose estimation network.The lightest attitude estimation network requires only 0.64 G floating point operations per second(FLOPS).One of the lightweight networks merged with the most advanced network(SOTA)on the MPII data set has a large amount of computation and higher accuracy.(2)The overall structure of the network is stacked by multiple encode decoder structures.More specifically,it includes the following structures: feature extraction(downsampling),resolution recovery(upsampling).The whole network structure extracts multidimensional features through multiple times feature extraction and resolution restoration.(3)Mobile Net V2 suitable for lightweight networks is used as the feature extraction module in the whole network structure.(4)Major innovation of this article is that pixel shuffle is used as the resolution recovery module of the whole network structure.This structure replaces transposed convolution as a decoder,and this operation can reduce the amount of parameters by half compared to the traditional decoder structure.This article uses a detailed formula to explain that this substitution is appropriate.(5)In order to accelerate the convergence of the network and reduce the problems of gradient vanish,the skip structure of the network and the intermediate supervision method of the network are used to ensure the stability of training.(6)Experiment with the innovative network structure proposed in this paper,and prove through data that each structure has its design considerations.Through experiments,it can be proved that the proposed structure has a good balance between the amount of parameters,the amount of calculation,the accuracy,and the speed of convergence.In addition,it also compares the commonly used academic model compression methods,such as: network pruning,low-rank approximation,and network distillation technology,which proves that this paper has both parameters and accuracy in comparison with the crude compression of mature networks.There are advantages.
Keywords/Search Tags:2D human pose estimation, lightweight network design, sub-pixel convolution, convolutional neural network
PDF Full Text Request
Related items