Multi-view 3D human pose estimation is an important research direction in the field of computer vision.With the continuous enrichment of data sets and the development of deep learning,the research on human pose estimation has developed towards the direction based on convolutional neural networks,and has achieved excellent results.Widely used in human-computer interaction,intelligent medical care,intelligent security,game production,film and television special effects,sports training,and other fields.Meanwhile,it also faces more and more challenges,one of which is the occlusion problem,which is difficult to be effectively addressed by monocular-based3 D human pose estimation methods.Therefore,in this thesis,multi-view 3D human pose estimation is investigated from the following two aspects using multi-view.1.Multi view fusion 3D human pose estimation based on 2D single view optimization.The general multi view 3D human pose estimation method directly projects the 2D pose estimation from each perspective into the 3D space for 3D pose reconstruction,but it is easily affected by the accuracy of single view 2D pose estimation.Therefore,this thesis proposes a three-dimensional human pose estimation method based on two-dimensional single view optimization.Before projection,fuse 2D joint thermal maps from different perspectives to optimize the 2D joint thermal maps from each perspective;Simultaneously multi stage fusion structure,integrating the geometric information contained in the 3D pose and the texture information contained in the original image,gradually connecting them,and gradually optimizing the 2D joint point thermal map;Finally,complete the 3D coordinate reconstruction.After experimental verification,this method can effectively improve the accuracy of threedimensional coordinates of joint points.2.Multi view fusion 3D human pose estimation based on orthogonal projection.Based on the joint point 3D voxel method,the joint point 3D coordinate accuracy is higher,but this method will generate quantization errors,and the coordinate accuracy is constrained by the size of the voxel.Under the same accuracy,the network computational complexity also increases with the cubic power of the 3D space size.Therefore,this thesis proposes a three-dimensional human pose estimation method based on orthogonal projection.Project 3D voxels onto three mutually orthogonal planes,and first use a 2D attitude detector to obtain the thermal maps of 2D joint points on each plane;Next,a method is used to calculate the center position of the joint point to reduce the impact of quantization errors;Finally,a simple network is added to learn the weights of two-dimensional joint point fusion between different planes,and the fusion results in the three-dimensional coordinates of the joint points.After experimental verification,compared to the results without the use of orthogonal projection,this method achieves a slight decrease in joint point error while improving inference speed by approximately 1.3 times. |