Augmented Reality(AR)technology is a new technology that closely integrates real-world information and virtual world information,and superimposes the virtual world on the real world and interacts with it through the screen.Augmented reality technology abandons traditional input methods such as mouse and keyboard,and integrates natural and convenient gestures,voice and other intelligent interaction methods.Among them,gestures,as one of the ways for people to communicate with the real world,have become the first choice for augmented reality human-computer interaction due to their natural,flexible,fast and convenient characteristics.In the field of gestures,related researches such as directional interaction and pose estimation have also attracted widespread attention due to their great academic significance and practical application value.Aiming at the actual interaction requirements of augmented reality scenarios,this paper studies gesture-oriented interaction and hand pose estimation.Gesture-oriented interaction is to solve the problems of algorithm redundancy and complex process in existing augmented reality oriented interaction.The research content mainly includes data set creation,algorithm design and model application;hand pose estimation is to improve the accuracy of augmented reality pose estimation and real-time performance,this paper conducts research based on depth image,color image and RGB-D fusion image.The research content mainly includes the design of hand pose estimation algorithm for different modal images,the verification of experimental results and the transplantation of the model.The specific tasks are as follows:(1)Aiming at the existing problem of gesture-oriented interaction,this paper proposes a lightweight Hand Pose Orientation Interaction(HPOI)framework.The HPOI framework supports depth image hand detection,simplifies 3D pose estimation of the whole hand,determines the gesture direction with the coordinates of the index finger double joints,and analyzes the interaction instructions according to the user’s gestures,and then manipulates AR virtual objects.The experimental results show that the HPOI framework improves the accuracy of orientation interaction and reduces the gesture orientation time;the actual effect of transplanting the framework model to AR devices further verifies the feasibility of this method.(2)For the task of hand pose estimation of depth images,this paper proposes a hand pose estimation algorithm(Depth Image Hand Pose Estimation,DIHPE-Net).The algorithm mainly consists of three sub-modules: feature extraction module,pixel coordinate module and U-Net codec.The feature extraction module has a small number of parameters and supports fast extraction of deep image hand features;the pixel coordinate module can map hand features and predefined hand joint maps one by one;U-Net codec is mainly composed of graph convolution,graph pooling A fourth-order symmetric network structure implemented by the integration and diagram pool,which can accurately return the pixel coordinates to the three-dimensional joint coordinates.The experimental results on the test data set verify the accuracy and real-time performance of the depth image gesture estimation algorithm,and the algorithm is applied to the virtual object moving scene,which increases the interaction flexibility.(3)For the task of hand pose estimation of color images,this paper proposes a hand pose estimation algorithm(RGB Image Hand Pose Estimation,RGBIHPE-Net).The network also contains three sub-modules: feature extraction module,joint graph inference module and 3D coordinate module.The feature extraction module is used to quickly extract the hand features of color images;the joint map inference module is based on the hand joint map to realize the enhancement of hand features through the mapping relationship between pixels and joints;the three-dimensional coordinate module converts the enhanced hand features to feature dimension,so as to accurately predict the three-dimensional joint coordinates.Through end-to-end training,the accuracy and real-time performance of the color image gesture estimation algorithm are verified on the test data set,and the algorithm is applied to the virtual object zooming scene to verify the feasibility of the algorithm.(4)For the hand pose estimation task of RGB-D data,this paper proposes an endto-end fusion hand pose estimation algorithm.Gesture estimation based on unimodal image information is affected by the characteristics of the image information and environmental conditions,and its robustness and estimation accuracy are not high.Aiming at this feature,this paper proposes a novel RGB-D fusion gesture estimation model.The feature extraction of the fusion model uses a lightweight inverted residual block,and based on the graph convolutional network,the mapping and matching of hand features and 2D joints is carried out,which enhances the hand feature information and quickly returns the 3D joint coordinates.The fusion method adopts data fusion,feature fusion and decision fusion,and improves the accuracy and robustness of gesture estimation by comprehensively utilizing the correlation and complementarity of the two image information.Finally,the effectiveness of the fusion model is verified on the test dataset.In this paper,the corresponding algorithm design is implemented for the gestureoriented interaction,depth image,color image and RGB-D fusion data of augmented reality,which not only achieves good accuracy,but also has good robustness.Further,the index finger double-joint ray provides a new idea for the follow-up gesture-oriented interaction of AR devices;the lightweight feature extraction design fully taps the representation ability of AR device hand images in tasks;the diversified graph convolution is designed for follow-up AR device gestures Joint estimation provides a variety of algorithm solutions;this paper also fully exploits the complementary information of RGB-D data in the gesture estimation task,laying a solid foundation for gesture estimation from multimodal visual data. |