In recent years,human pose estimation based on images has been a hot and difficult problem in the computer vision research community.Accurate and efficient human pose estimation can assist in solving a variety of complex vision tasks such as action recognition,human tracking and video analysis.With the unremitting efforts of researchers,human pose estimation has made great progress,but it still faces enormous challenges when dealing with single or multi-person cases under complex scenes.The difficulty of this problem is mainly reflected in: for the single person pose estimation,self-occlusion or other object occlusion can seriously affect the result of pose estimation;for the multi-person pose estimation,multiple-targets and occlusion problems make pose estimation more difficult and difficult to obtain satisfying result.For the single person pose estimation,this thesis proposes a novel multi-stage network architecture with two branches in each stage to estimate person poses in images.The first branch predicts the confidence maps for each joint point,compared with feature level,it can capture more explicit location information of the joints.The second branch proposes a new bi-directional graph structure information model(BGSIM)which can encode rich contextual information.BGSIM fully considers the occlusion relationship among different joints to solve self-occlusion or other-occlusion,thereby facilitating the prediction of joint points.For the multi-person pose estimation in complex scenes,human pose estimation is more challenging due to the uncertainty of number of people or the fact that the same body joints of different people overlap.On the basis of study of single person pose estimation,we add corresponding post-processing in dealing with multi-person pose estimation.And in process,the Integer Linear Programming(ILP)and soft-NMS algorithm based on bidirectional graph structure information model are adopted.The ILP labels and partitions the body part candidates,especially for the occluded keypoints,as well as facilitates the counting of people in an image,thereby avoiding false associations.Soft-NMS does not completely remove the non-maximum detection keypoint but instead decays its score,thereby allowing us to separate overlapping joints and effectively reduce false joint connections.For single person pose estimation LSP dataset,the average precision of the proposed method in this thesis is 82.3;For multi-person pose estimation COCO Keypoint Challenge dataset and MPII dataset,the average precision is 62.9 and 77.6 respectively.Compared with other existing pose estimation algorithms,the proposed approach in this thesis achieves equal performance on the LSP dataset and achieves best performance on the COCO and MPII datasets.In addition,the proposed method in this thesis can achieve best results on our selected multi-person dataset without any extra training. |