In recent years,simultaneous localization and mapping(SLAM)technology has become more and more mature while the demand for semantics in the environment has become more and more urgent,semantic information is extremely important for mobile robots in positioning and navigation.In this paper,an indoor monocular semantic SLAM framework is proposed,which realizes the machine’s understanding in the 3D scene with simple sensor structure and low cost.Optimized object detection network is adopted to obtain more accurate semantic information.Monocular depth estimation network is introduced to overcome the bottleneck of depth estimation in ORB-SLAM2.The framework in this paper can be successfully run in the real scene,which prove that the combination of deep learning and SLAM technology is feasible.Focusing on the construction of monocular semantic map,the work in this paper has is as follows:(1)Regarding the weakness of deep triangulation in the ORB-SLAM2,deep learning is utilized to complete depth estimation,which solves the contradictions of traditional triangulation and the trouble in pure rotation.In terms of the problem of poor depth estimation accuracy using fully convolutional network(FCN)directly,an improved depth estimation network is proposed,using Res Net50 as the backbone network,and the up-sampling layers with cascaded and parallel two ways is designed based on multi-scale convolution,which can integrate all layers information to refine the depth prediction results.In addition,the data set is expanded to improve the generalization ability in the model.It can be verified that the parallel up-sampling layer can effectively improve the accuracy of depth prediction.(2)The two-stage Faster R-CNN network is introduced to acquire semantic information of key frame in the ORB-SLAM2,the network can detect 9 types objects.In order to obtain more refined results,Inception_resnet_v2 is selected as the backbone network in Faster R-CNN,the anchor mechanism in region proposal network(RPN)is optimized,the objects properties in datasets are counted,and the size and number of anchors are redesigned according to the statistical results.Experiments show that object detection performance can be effectively improved through reasonably design of anchor mechanism referring to datasets.(3)The indoor monocular semantic SLAM framework designed in this paper is tested on the software.The framework is run on the official TUM datasets,which verifies the feasibility of this framework.The improved depth estimation network and optimized object detection network in this paper is tested on the real scene.Experiments show that the two networks can be considerably applied in real scenes,the acquired 2D points is mapped to 3D,and the framework in this paper is realized in the real scene,which proves the effectiveness of this framework. |