| In modern technologies such as motion recognition,virtual reality and human-computer interaction,motion capture system captures the 3D joint position of human body in images or videos.However,at present,the motion capture system needs to wear corresponding professional equipment,which is often inconvenient.3D human posture estimation method can effectively solve this problem.Due to the inherent depth fuzziness in the promotion of two-dimensional image to three-dimensional space,and the previous research methods using convolution or cyclic neural network,there are some problems such as inability to compute in parallel,capture long distance dependence and excessive number of parameters.Therefore,this paper proposes an improved 3D human posture estimation model and an effective semi-supervised training method based on Transformer.In view of the problems existing in previous models in the field of 3D human posture,such as the inability to capture long distance dependence and excessive number of parameters,this paper designs an improved encoder model based on Transformer,called Extended Transformer encoder(DTE).The long sequence of two-dimensional key point position is simply and effectively promoted to a single frame of three-dimensional posture.The overall model is divided into two parts:space module and time module.The spatial module uses the original Transformer encoder to extract the spatial relative position information of multiple nodes in each frame.The time module uses DTE to model the long-term dependence of 2D attitude sequence.In order to reduce sequence redundancy,time extended convolution is designed to replace the full connection layer.The improved DTE can not only capture long distance dependencies by using the attention mechanism to realize global information extraction,but also effectively combine local information through time extended convolution.At the same time,the computational cost can be significantly reduced due to the time expansion convolution property.Finally,the experiment was carried out on the Human3.6M dataset,and the results exceeded most of the existing work.Although the error was increased by 0.6mm compared with the optimal results,the computational effort was only 63%of it.Aiming at the difficulty of acquiring human pose estimation marker data,this paper designs a semi-supervised training method for a small number of human pose marker samples.Firstly,a small amount of labeled 2D data is trained to obtain a preliminary 3D human posture estimation network,and then a large amount of unlabeled data is processed through this network to obtain 3D posture.The obtained 3D posture is projected back into 2D space and processed to obtain pseudo-labelled data.The preliminary 3D human posture estimation network model is trained by combining labeled data and pseudo-labelled data.The results of supervised learning were compared with those of Human3.6M data set.In the case of 10%S1 labeled data set,the error of semi-supervised learning was reduced by 9.7mm compared with supervised learning. |