As information technology advances,computer vision has become a hot topic,with 3D human pose estimation being a key step for tasks like behavior recognition,action transfer,and person re-identification.It locates the 3D coordinates of the human body in an image or video.The two-stage 3D human pose estimation based on a single view has been heavily researched,with promising results.However,challenges remain,such as noise in the 2D pose prediction,ambiguity between 2D and 3D poses,and limited datasets.This thesis explores a two-stage monocular 3D human pose estimation to address these challenges.To address the challenges posed by noisy 2D poses and ambiguous correspondence between 2D and 3D poses,this thesis proposes a 2D sequence denoising module called TDSD.TDSD consists of two network modules: the Double Triangle Network(DTRINet)for local sequence information extraction and the LSTM Pose Network(LSPSNet)for global sequence information extraction.DTRINet divides the human body’s key points into local blocks and uses two triangular networks to extract local information,while LSPSNet treats the key points as a sequence and extracts global information using a sequence network.Three loss functions are used to combine the global and local information.By utilizing the perspective projection relationship,the 2D keypoints contain more depth information and ambiguity between2 D and 3D poses is reduced.Overall,TDSD reduces noise in 2D poses using DTRINet,LSPSNet,and sequence information fusion.To address the issue of a single dataset for 3D human pose estimation,this study introduces a 3D pose PoseGan data enhancement module comprising of a PoseGan network and a 3D human pose legitimacy verification mechanism.The PoseGan network consists of a 3D pose generator(Pose G)and a 3D pose discriminator(Pose D).Pose G generates samples that conform to the 3D pose data distribution by inputting normally distributed noise Z,and Pose D determines the authenticity of input generated samples.The two networks are trained alternately.After training,Pose G generates batch 3D pose samples,and the samples that meet the constraints of the human body are selected by defining the human skeleton and using human body mechanics verification.The 3D pose PoseGan data augmentation module enriches the data and enhances the model’s generalization ability by producing various legal 3D pose samples.The modified 2D keypoints are utilized for human image completion in this study.Moreover,the effectiveness of the proposed 2D sequence denoising module TDSD and3 D pose PoseGan data enhancement module is verified and evaluated through numerous contrast and ablation experiments.The results indicate that both models exhibit superior performance.Compared to state-of-the-art techniques in the field,the proposed model demonstrates smaller errors in multiple evaluation metrics,demonstrating its competitiveness. |