| Visual odometry can understand the 3D structure of surrounding environment and obtain the poses of the motion camera by analyzing and processing the image sequence.It plays an important role in robot navigation,autonomous driving,virtual reality(VR),augmented reality(AR),3D reconstruction etc.In recent years,deep learning and convolutional neural network technique have achieved remarkable success in the field of computer vision,such as image recognition and tracking.It has also led researchers start to apply deep learning to the research of visual odometry technology.In this paper,we explore visual odometry based on unsupervised deep learning and design a network framework which can simultaneously complete the scene depth estimation and camera pose estimation task.At the same time,the targeted optimization work is completed to solve the problems in existing methods.The main contents and innovations of this paper are summarized as follows:1.An optimization method for depth estimation based on occlusion mask is proposed.When using the image reconstruction loss function to train the network,existing methods ignore the phenomenon of image reconstruction error in occlusion area between spatial image pairs,and this phenomenon will cause calculation error in loss function,then,affect the performance of the depth estimation after network training.To solve this problem,in this paper,first,a binary occlusion mask’s calculation method is proposed.Second,the mask is used to optimize the image reconstruction loss function in spatial domain of the designed framework,which the occlusion area is filtered from the loss function calculation to eliminate the bad impact caused by occlusion area.Ultimately,the performance of depth estimation of the designed framework can be improved.The experiments are performed on public KITTI dataset,and the results show that the designed framework has outperformed the state-of-the-art methods in depth estimation by a larger margin when using this method.At the same time,the camera pose estimation performance can also be improved,indicating that the better depth estimation results can improve the performance of pose estimation indirectly.2.An optimization method for pose estimation based on optical flow is proposed.As an estimation task between image pairs,pixel matching information between adjacent images is important for pose estimation.However,the pose estimation network of existing methods tends to extract texture features of image,which leads to poor performance of the pose estimation network.To solve this problem,the pose estimation network in this paper,which is composed of feature extraction module and pose estimation sub-module originally,an optical flow estimation sub-module is added as auxiliary module.And the feature extraction module is shared by the pose and the optical flow estimation sub-module,as well as designing corresponding temporal domain loss function and taking the form of jointly training,which guides the feature extraction module to learn more pixel matching information between adjacent frames that is suitable for pose estimation.The experiments performed on KITTI dataset show that the proposed method can further improve the performance of pose estimation of the designed framework.At the same time,good pose estimation results can also be obtained on the Malaga-Urban and Uav-Town dataset,indicating that the generalization ability of pose estimation network in different scenarios can also be improved by using this method. |