Font Size: a A A

Research On Monocular Visual SLAM Algorithm Combined With CNN And IMU In Complex Environments

Posted on:2021-05-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:P MaFull Text:PDF
GTID:1368330602982915Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
The monocular visual SLAM system is a computer vision system that uses the camera to collect image information for positioning and map construction.SLAM technology is widely used in 3D reconstruction,visual obstacle avoidance,and path planning.SLAM technology is of great significance to improve the autonomy and intelligence of unmanned autonomous systems.SLAM technology is a research hotspot in the field of computer vision and has developed very rapidly.However,the existing visual SLAM system is mainly based on the feature point method.In practical applications,there are often a large number of complex scenes,such as lighting changes,missing textures,fast rotation,and moving targets.In these complex environments,the traditional monocular visual SLAM algorithm cannot fully sense the environmental information,and there is also a phenomenon of motion degradation.There is a large space for improvement in the accuracy and robustness of the algorithm.Therefore,this paper makes in-depth research on some problems in the existing SLAM algorithms.The main contributions of this article are as follows:(1)Aiming at the problem of complex initialization process,strict conditions and uncertain scale of monocular visual SLAM system.In recent years,with the development of deep learning theory and the improvement of computing power of GPU and other hardware,this paper introduces the deep prediction network based on CNN.The depth value of the key pixels in the initial image frame is obtained by the depth prediction network,and a scale is estimated for the system,which alleviates the scale uncertainty of the monocular visual SLAM system to a certain extent,and reduces the initialization threshold of the monocular visual SLAM system,that is,it can be initialized without translation.At the same time,in the case of rapid rotation,it can still be initialized successfully.In order to improve the accuracy of the depth value of the key pixels in the subsequent frame,this paper designs the brightness value coding sequence mode of the pixels,which increases the structural stability and correlation between the pixels,and improves the matching degree of the pixels.In the case that the data set tum provides information such as luminosity model parameters and exposure time,the experimental results show that the algorithm in this paper has a certain improvement in accuracy compared with the current leading algorithm DSO.(2)Because monocular visual SLAM systems often work in the large unstructured outdoor dynamic scenes,but the monocular visual SLAM algorithm depends on the static spatial road marking.In order to make the monocular visual SLAM system adapt to the dynamic working environment,this paper introduces Mask R-CNN to identify and segment the specific target in the key frame.The pixel value of a single target segmentation area is multiplied by a value for different target categories to the same mask,and the pixel value corresponds to the target category ownership of the pixel points in the original image.Since the network can't perceive the movement association of the target in time and space in the environment,so this paper defines the different categories of targets in the scene as static targets or potentially dynamic targets,for example,cars and people are potential dynamic targets,the coordinates of key pixels are corresponded to the pixels of the mask,the real state of the potential dynamic pixel is determined by the movement consistency detection of the pixels.The results of experiments show that the accuracy of the algorithm in this paper is improved compared with DSO in the dynamic environment.(3)In the previous discussion,in the initialization of monocular visual SLAM system,the depth prediction network based on CNN is used to predict the depth value of the initial image frame,which can quickly complete the initialization of monocular visual SLAM system.However,in the absence of information such as luminosity model parameters and exposure time,there is scale drift in monocular visual SLAM system,which seriously affects the accuracy of the system.The scale drift problem is effectively solved by the fusion of monocular camera and IMU.The results of Experiments show that the proposed algorithm can achieve better accuracy and robustness than DSO in the absence of information such as photometric model parameters and exposure time.To sum up,this paper makes a deep study on the problems in the monocular visual SLAM method.Aiming at the initialization problem of monocular visual SLAM system,a depth prediction network based on CNN is proposed to predict the depth value of the initial frame of the system,which can still be initialized successfully in the case of fast rotation and so on,at the same time,the brightness value coding sequence of pixel points is designed to increase the structural characteristics of the points.Aiming at the problem of poor accuracy of monocular visual SLAM system in dynamic environment,this paper uses depth prediction network to predict the depth value of key image frame,and introduces image segmentation network to identify potential dynamic targets that affect the system accuracy in image frames.In this paper,CNN and IMU are combined to improve the positioning accuracy of monocular visual SLAM system.The results of experiments show that the proposed algorithm can improve the accuracy and robustness of the monocular visual SLAM system.
Keywords/Search Tags:The monocular visual SLAM, Depth prediction network, Semantic segmentation, IMU
PDF Full Text Request
Related items