| With the development of augmented reality(AR)and computer vision,the single twodimensional scene and backward interaction methods can no longer satisfy the increasing needs of users.The essence of AR is to superimpose the rendered virtual objects on the real world layer which captured by the camera.The virtual objects are three-dimensional,and the images captured by the camera are two-dimensional,which leads to the inability of the two to be perfectly superimposed.There are visual errors such as misalignment and shaking,making the user experience is poor.Secondly,the traditional interaction method can not meet the AR interaction requirements.Only working with additional auxiliary equipment such as data gloves or handles,which lacks naturalness and development is constrained.Modern AR systems(such as Kinect)are costly and unusable using at outdoors,also user interaction requires a data handle.The whole system doesn't have broad applicability.To make AR easy to realize and naturally,our paper combines the rapid developing simultaneous localization and map reconstruction(SLAM)technology and neural network technology in computer vision in recent years.The work of this paper is mainly reflected in the following aspects:1.Detect point cloud and plane reconstruction using a monocular vision method.The visual SLAM method can reconstruct the 3D point cloud by determining the depth information of the point,which is measured between the multi-frame images by the polar geometry method.The appropriate feature point algorithm is selected to realize the 3D point cloud acquisition,And improve the visual odometer part of the visual SLAM,by using the IMU module to improve the phenomenon of rapid motion loss in the visual SLAM,finally construct a three-dimensional point cloud with depth information.The point cloud is searched through the KD tree management,and the plane is fitted by the least squares method.2.Gesture recognition through convolutional neural network training.Convolutional neural network can extract multi-scale features and has a good effect in the field of image recognition.Therefore,using convolutional neural network for gesture recognition,using MobileNet neural network architecture instead of ordinary neural network,it has advantages such as fast calculation speed,low hardware requirement,and can be implemented on the mobile side.Using the SVM classifier instead of the Softmax classifier in the convolutional neural network,extract the features obtained from the training of the full connection layer,then use the SVM classifier for the second training can obtain a high-precision recognition rate and generalization ability Identify models.improve neural network recognition.Experiments show that the improved visual SLAM can reconstruct the sparse point cloud stable and fast with monocular vision,then it combines the point cloud information to extract the spatial plane information and realize the spatial structure recognition.Using the improved MobileNet architecture to train CNN for gesture recognition can effectively reduce the amount of calculation.Finally making AR easier and more natural. |