Object detection and positioning are important fundamental tasks in the field of autonomous driving.As an important application scenario for automatic parking,underground garages have a clear demand for highly reliable positioning.However,due to the limitations of lack of GPS signals,weak textures and complex lighting conditions,traditional positioning methods have poor positioning accuracy in the underground garage scene.Monocular 3D object detection methods have a wide range of application scenarios due to their low cost and can provide useful semantic information for positioning.However,due to the lack of 3D depth information,there is still a large performance gap compared with methods based on Li DAR point clouds.Taking into account the above information,this paper studies the monocular 3D object detection and monocular semantic SLAM algorithm in the underground garage scene,which provides an effective monocular 3D object detection method and a feasible solution for reliable monocular vision positioning in the underground garage scene.The main research contents and contributions are as follows:Aiming at the characteristics of the underground garage scene itself and the static objects in it,this paper proposes a monocular 3D object detection method Pillar Net based on prior geometric constraint information.A rotation consistency loss,a orderliness loss and a homography loss are introduced.The rotation consistency loss utilizes the consistency of the orientation the pillars in the scene.The orderliness loss utilizes the orderliness of the arrangement of the pillars.And the homography loss utilizes the flatness of the ground.Those losses introduce strong geometric constraints for monocular image data which lacks 3D perception.The experiments on the self-built dataset show the significant accuracy improvement brought by this method,which verifies the effectiveness of the design concept and method.Considering that the GPS signal in the underground garage scene is difficult to obtain,it is suitable for SLAM methods for positioning task.However,due to weak texture and complex lighting conditions,traditional visual SLAM methods have poor performance.Aiming at this problem,this paper proposes a monocular visual semantic SLAM framework that fuses 3D object size and pose information naming OP-SLAM(Object and Point based SLAM).Different from traditional visual SLAM methods,OP-SLAM introduces the 3D bounding boxes of the objects obtained through Pillar Net and couples them to the visual odometry,map establishment and back-end optimization parts,which significantly improves the accuracy of the SLAM trajectory.Since the objects in the OP-SLAM map can be used to correct the cumulative error of visual SLAM positioning in long-term positioning tasks,they are retained and used as a longterm effective object map.Experiments on the self-built dataset and KITTI dataset verify the effectiveness of OP-SLAM in different underground garage scenes on SLAM tasks and longterm positioning tasks,as well as the adaptability of the framework to ordinary road scenes. |