Font Size: a A A

Research On Motion Capture Methods Based On Monocular Vision And Its Applications

Posted on:2024-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z GengFull Text:PDF
GTID:2568306908982999Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision technology,the motion capture methods have various applications in the fields such as virtual reality,computer animation,sports analysis,and medical rehabilitation.The monocular vision-based motion capture methods and their applications are discussed in this dissertation.(1)An improved joint-feature encoder based on the deformable and spatio-temporal convolutional residual network(DRN-T)is proposed by improving that in the ResNet network.The DRN-T blends residual networks,deformable convolutions and temporal feature to improve the accuracy and robustness of joint-feature extraction to some extent.Specifically,the DRN-T model is obtained by replacing the first standard convolutional layer of the ResNet residual block with a deformable convolution and fusing it with LSTM to improve the performance of the model under different poses and viewing angles;the spatio-temporal information is captured by analyzing continuous video frames based on the DRN-T model to make the model consider the continuity of motion in order to enhance the robustness of the model by using temporal information;the detection performance of joints with different sizes and distances based on DRN-T model is improved by fusing features at different scales to increase its adaptability when dealing with different scenarios.(2)A monocular vision 3D motion capture improvement method based on kinematic joint layer constraints and a spatio-temporal convolutional network,called HK-TCN,is proposed by improving the time-dilated convolution-based neural network.HK-TCN leverages joint constraints,local motion features,and temporal dependencies to enhance prediction accuracy.The model further optimizes joint position prediction by introducing a Graph Convolutional Network(GCN)module into HK-TCN to establish connections between joints and build a graph structure;the model reduces complexity and improves computational efficiency by dividing the input joint coordinates into different joint groups to more effectively capture human body local motion information;the model captures temporal dependencies at different time scales in complex motion sequences by using multi-layer temporal convolutions and dilated convolutions in HK-TCN;the model balances prediction accuracy and joint constraint satisfaction during optimization using a combined loss function that includes mean squared error loss and joint constraint loss;based on HK-TCN,the model has stronger generalization capabilities,maintaining high predictive performance in different scenes and action types.(3)A 3D human motion capture pipeline suitable for monocular video streams is proposed based on the DRN-T and HK-TCN models,achieving end-to-end fully automated processing of 3D human motion capture from input video streams to BVH format.Experimental results show that this method has high accuracy and stability.The work reduces the limitations of traditional multi-camera or sensor equipment,lowers the cost and complexity of motion capture,and improves portability,providing strong support for widespread applications in virtual reality,computer animation,sports analysis,medical rehabilitation,and other fields.
Keywords/Search Tags:motion capture, monocular vision, deformable residual net, time convolution, BVH data
PDF Full Text Request
Related items