| The purpose of human pose estimation is to identify and locate the key points of the human body.These key points can be connected according to the sequence of human joints to get the trunk of the human body,and then get the pose of the human body.However,the existing methods are still not very effective in dealing with large pose deformation,partial occlusion and complex background.In human pose estimation,there are still wrong and missed detection of difficult key points,and there are fuzzy problems in posture estimation.Moreover,there are few examples in common public data sets,such as large posture deformation,partial occlusion and complex background.The main work and contributions of this paper are as follows:(1)Based on the stacked hourglass network,this paper proposes a stackede hourglass network algorithm with cascaded feature fusion.Due to the limitations of the stacked hourglass network,the insufficient extraction of key point context information and the loss of information caused by repeated up and down sampling,and the shallow features are more important for human pose estimation,this paper adds a large receptive field residual module,a preprocessing module and a horizontal connection structure to the original stacked hourglass network from the perspective of fully extracting and utilizing the shallow features.Experimental results show that the stacked hourglass network algorithm based on cascaded feature fusion improves the accuracy of human pose estimation.(2)Aiming at the fuzziness of human pose estimation,a human pose estimation method combined with human body analysis is proposed.Human body parsing is to apply the semantic segmentation technology to the human body to segment various parts of the body.The results of human body parsing can provide useful clues about the shape of the human body.At the same time,the segmentation of body parts can constrain the position of key points.The obtained human body parsing information is combined with the structural information,which effectively corrects the ambiguity in the estimation of human posture.(3)To solve the problem that there are few instances in scenes such as large pose deformation,partial occlusion and complex background in common public data sets,this paper uses semantic data enhancement method to simulate occlusion in real situations and synthesize instances in occlusion situations.The human body is semantically segmented through the above human body analysis method,and then the body parts are randomly selected and placed in the picture to simulate the occlusion,and the enhancement parameters of the network are dynamically adjusted in the way of adversarial learning.Customized training samples are generated for the pose estimation network,which further improves the accuracy of human pose estimation. |