| Human body pose and shape estimation is an important research content in artificial intelligence and computer vision.It is widely used in human body modeling,VR/AR,3D animation and other fields.Current methods have poor generalization ability,rely on original image quality,and deal with occluded scene prediction.Disadvantages such as poor performance and distortion in predicting human motion.In view of the above problems,this thesis uses deep learning to realize the task of 3D human pose and shape estimation based on Skinned Multi-Person Linear(SMPL)human body model parameters.The main content can be summarized as follows:(1)A feature fusion method have been proposed to achieve 3D human body pose and shape estimation.In addition to the original picture,human joint points and semantic segmentation map are added as network input to reduce the dependence on the quality of the original picture.Firstly,using the Open Pose and SCHP algorithms with strong generalization ability to extract the joint points and semantic segmentation map of the original image as human body structure information,which can provide effective human body structure information for the network in the face of occlusion scenes;Secondly,select Mobilenet V3 and Resnet50 to extract the features of the human body structure and the original image are fused;Finally,the parameters of the SMPL human body model are predicted by using the cyclic iterative network.(2)We utilize Transformer to realize 3D human pose and shape estimation.Firstly,image enhancement operations such as random occlusion and adding noise are applied to the original image;Secondly,Transformer is used to replace the traditional convolutional network to extract image features and predict SMPL parameters to improve network generalization performance;Thirdly,using differentiable rendering technology to map 3D model to 2D black and white contour and construct loss function;Finally,the known human motion prior constraints are used to punish incorrect human motion and improve the problem of predicting human motion distortion.(3)On the Human3.6M and 3DPW datasets,the two proposed in this thesis reduce the reconstruction error of the existing human model parameter-based methods by 4%.At the same time,a large number of half-length portraits and artificial occlusion pictures were selected for testing.The results show that the proposed two methods have strong generalization ability,respectively improving the problems of poor prediction results and human movement distortion in occluded scenes. |