Font Size: a A A

Research On Gesture Recognition Technology In Human-computer Interaction Of Virtual Scene

Posted on:2019-11-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z X HuFull Text:PDF
GTID:1368330596959596Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Digital twin and virtual/augmented reality have achieved rapid development in recent years,which has promoted the innovation of many traditional industries,such as manufacturing,construction,education and other fields.Both of these techniques emphasize how to better map physical space to virtual space and how physical space and virtual space can be better interacted.It is not difficult to imagine that the physical space in many future scenarios will be accompanied by ultra-realistic virtual space.Therefore,the interaction between physical space and virtual space,especially the human-computer interaction problem under virtual space,will be an important and urgent problem to be solved.From the definition of virtual reality,the most important features are immersive and interactive.The key to enhance the user's immersion is the principle of consistency,that is,the feedback/reaction of the virtual space can be synchronized with the state of the user's physical space.The existing virtual reality glasses mainly solve the problem of visual consistency,that is,the user's perspective is obtained in real time by sensors such as the ellipsometer,and the scene in virtual space are correspondingly transformed in real time,so that it can be synchronized with the user's vision.For the interaction,the user's gesture should also be consistent with the physical space in the virtual space,that is,the gesture gesture of the user in the physical space can be synchronized with the gesture gesture in the virtual space,so the gesture estimation is most suitable.The research content of this paper is mainly focused on the problem of gesture estimation.As a challenging problem,it has been plagued by many researchers.The main difficulties include high dimensionality,self-occlusion,uncontrollable environment,rapid change,and large amount of calculation.In recent years,with the development of depth sensors and deep learning,new solutions and ideas have been provided.Based on this background,the main research work and innovation results in this thesis are as follows:(1)The depth image is acquired by the depth camera,and the appropriate number of joint points is selected as the output of the model for indicating the corresponding gesture posture.According to the characteristics of the acquired images,the datas are augmented by adding noise and random disturbance,which overcomes the problem of poor accuracy of the depth sensor and improves the robustness of the model.According to the task characteristics of gesture estimation,the deep convolutional neural network is studied,and the corresponding network model is constructed for the end-to-end prediction joint point coordinates,which compared with the traditional regression model.(2)In order to improve the prediction accuracy of the network,through analyzing the characteristics of the deep convolutional neural network,from the perspective of improving the feature extraction ability,the network structure of multi-scale feature fusion is studied,and its output function is optimized to improve the estimation effect of the network model.To verify the effectiveness of the proposed method by comparing it with the work of others.(3)To solve the problem of deep neural network dependence on data,and to fully explore the intrinsic information of the data set.The unsupervised/weakly supervised learning method is studied.Combined with the characteristics of the data,the input image reconstruction is used as a weak supervised target.The weak supervised optimization model based on adversarial autoencoder is studied,and the accuracy of prediction is improved.At the same time,the intrinsic dimension of dataset is also studied.(4)In order to solve the problem of poor accuracy of depth sensor,the model input is changed from single frame to continuous frame.In order to cope with the problem of continuous frame,the method of combining convolutional neural network and recurrent neural network is studied,and the convolutional recurrent neural network module is proposed.And compared with the proposed three-dimensional convolutional neural network,the final recognition effect is achieved.(5)In order to make the model more lightweight,the structure simplification and optimization method of neural network are studied.A three-dimensional separation convolution operation is proposed,which reduces the number of parameters,improves the operation speed,and realizes the compression of the network model.Finally,combined with the corresponding virtual reality development technology,the relevant application cases were built.
Keywords/Search Tags:virtual reality, gesture interaction, depth image, deep learning, network optimization
PDF Full Text Request
Related items