The primary objective of the 3D human body reconstruction task is to reconstruct 3D human body models with accurate shape and pose from monocular or multi-view images or videos.As one of the popular research topics in the field of computer vision in recent years,3D human body reconstruction has a wide range of application prospects in various fields such as human-computer interaction,intelligent surveillance,virtual reality,smart medical care,and sports science.It also plays an important role in downstream tasks of computer vision such as behavior recognition,anomaly detection,and expression analysis,thus attracting the attention of numerous researchers.However,the field of 3D human body reconstruction still faces many challenges,which limits its performance in practical applications and requires further research and optimization.Humans can easily understand and discover 3D shapes and poses from 2D images thanks to their rich life experiences.However,inferring 3D models from 2D images is inherently uncertain in the absence of prior information,and traditional methods struggle to achieve human-level performance in unconstrained outdoor scenes with complex human movements and diverse appearances.Based on this,this thesis focuses on the existing problems in the field of 3D human pose estimation in unconstrained environments and conducts theoretical research and experimental verification.The main research work and contributions are as follows:(1)To address the common occlusion and high-speed motion blur phenomena in outdoor environments,this paper proposes a 3D human body reconstruction framework based on focused attention and gated recurrent units for video sequences.The framework uses a gated recurrent unit encoder to mine the historical pose information of the human body in the video sequence and compensates for the pose information of high-speed blurred human bodies,especially human body parts.In addition,this paper also introduces enhanced contextual features and spatial attention modules to further improve the network’s ability to extract human joint features.Moreover,the framework takes full advantage of the long-range feature capturing capability of the Transformer model and proposes a multi-layer focused encoder module to enhance the modeling ability of combining fine-grained local interactions and coarse-grained global interactions.Meanwhile,the masked vertex modeling technique effectively enhances the model’s ability to recover occluded body parts.(2)Conventional 3D human body reconstruction algorithms usually use the base template of parametric models as prior information to constrain the position information of vertex-to-vertex interactions in the 3D human body mesh.However,since the base template is only introduced as an initialization form,its position information is not universal in unconstrained environments,and the prior information on human body shape and pose has not been fully exploited.Therefore,this paper proposes a dynamic model regression module that incorporates spatial attention and dynamic Gaussian attention to enhance the model’s focus on local areas,especially human joint positions,and further improve the prediction accuracy in unconstrained environments through iterative methods.Additionally,to accelerate model convergence and take full advantage of the strong prior information provided by the proposed method,this paper combines multiple loss functions,including 3D keypoint loss,2D keypoint loss,2D re-projection loss,and 3D vertex loss.(3)Due to the limitations of existing deep neural networks,the fusion of global human body shape and pose information and local human body part features in the field of 3D human body reconstruction is often not satisfactory.To fully exploit the relationship between local and global features,this paper proposes a 3D human body reconstruction algorithm based on a dualbranch network architecture and designs a feature interaction module to enable information exchange between local and global branches.In addition,a feature fusion module is introduced to effectively merge local and global features that have undergone feature interaction,further improving the accuracy of 3D pose and shape prediction.Moreover,this paper adopts a multiconstraint loss function,including pose loss,shape loss,vertex loss,and joint position loss,to optimize the network training process and enhance the generalization ability of the model.The proposed method has been tested in comparative experiments and ablation studies on the well-known outdoor dataset 3DPW and the indoor dataset Human3.6M.It has also been evaluated on the high-speed motion-blurred 3DPW dataset,validating the effectiveness of the method.Moreover,this paper conducted fine-tuning training and prediction visualization on the hand dataset Frei Hand,which also demonstrated the generality of the proposed method in the field of human body part reconstruction.Compared to classical methods,the proposed method exhibits competitive performance. |