3D human pose estimation is to estimate the whole pose of human body in 3D space through pictures or videos.Due to the increasing range of applications(such as games,video analysis,motion analysis),3D human pose estimation has gradually become a hot research direction of vision.In recent years,with the significant effect of deep neural network in many fields,3D human posture task has gradually transformed from the traditional way to the mode of deep learning.At present,most of the researches on attitude estimation use convolutional neural network which is too deep.Although the accuracy of the network model is improved in this way,it may lead to a large number of redundant parameters and invalid calculation process in the network,which can not be applied to the scene where the memory space of the device is limited and needs to be fed back in time.Therefore,in order to solve the above problems,this paper improves the network structure under the framework of deep learning,so that the network has the characteristics of light weight and accuracy.The details are as follows:1.Considering that there may be too many redundant parameters in the network model,this paper adopts the idea of constructing a compact network,and constructs the dense full connection layer of the initial network model into a single hourglass according to a certain proportion,and makes a preliminary exploration on the model compression.And based on the advantages of densenet and referring to its structure,some deformation is carried out to eliminate part of convolution operation and retain the characteristics of multi-layer information transmission,so that the effective information can continue to guide the network forward,so as to effectively reduce the joint point error.2.In this paper,different scale layers are used in the network to better predict the position information of different joint points.At the same time,about 75%useless parameters in baseline network are reduced.In the same scale network layer,according to the characteristics of identity mapping of fusion mode,the information is fused,and the predicted joint information between different scale layers has a guiding role for other joint information.Under the combination of the two methods,the whole network has the superposition effect,which improves the accuracy of the network model.It has been obtained good results on the human3.6m human posture estimation datasets.3.Some effective information may be lost if the network is fused.In order to eliminate this influence,this paper splices the extracted main information horizontally between the network layers of the same scale,which not only does not increase the depth of the network,but also ensures that the effective information can be stored in the network as collective knowledge,and a group of features can be transmitted to the network in time,which is conducive to maximizing the information flow in the network.In the final output,in order to strengthen the local information and play a moderate role in guiding the network,it is superimposed and fused with the feature layer of the same scale.The further improved network structure reduces the parameters by 22%,and depends on the use of effective information,the accuracy is also improved. |