Font Size: a A A

3D Object Detection Algorithm For Road Scene Based On Monocular Vision

Posted on:2022-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2492306740495424Subject:Precision instruments and machinery
Abstract/Summary:
With the rapid development of artificial intelligence technologies such as deep learning,the system with artificial intelligence has been widely used in many fields,including transportation,security,medicine and education.Especially,the development of autonomous vehicles plays an important role in this field.Environmental perception is a crucial module in the autonomous driving system.Comprehensiveness and accuracy of environmental perception are the prerequisite and guarantee for the safety and intelligence of smart cars.Object detection is the most basic and important research direction in this field,which mainly solves the key issues of “what” and “where”.Although the 2D object detection is becoming more mature,the pixel information is difficult to satisfy actual application requirements.In response to this problem,this paper proposes a 3D object detection algorithm in road scenes just based on monocular images,aims to obtain the object’s position,3D size and orientation in the coordinate of camera.Firstly,this paper analyzes the requirements of the 3D object detection task and establishes the projection relationship between 2D image plane and 3D space under monocular vision.According to these,we designed a 3D object detection network with multi-task branch structure under monocular vision based on the two-stage framework and discussed the design of network structure,regression parameters and loss function in detail.The baseline experiment shows the method has a great work within the KITTI dataset compared with the other method based on monocular vision.Then,this paper provided the method to optimize the model from network structure and the ability of training,and designed a model named FPN-3D based on feature pyramid structure which improved the overall performance of this algorithm for 3D object detection,.Secondly,in order to address the lack of depth information in monocular vision,this paper future designed a 3D object detection framework based on semantic-depth fusion feature.We analyzed the current main semantic segmentation methods and depth estimation methods of monocular vision,and introduced the PSPNet network and U-Net network as the semantic feature encoder and the depth feature encoder respectively.In addition,the multi-level feature fusion method is optimized by introducing the Squeeze-and-Excitation module.Experiments shows that the fusion feature could future improve the performance of our framework.Finally,the experiment was extended within the nu Scenes dataset and the Lyft3 D dataset,which have the richer raw data and the more complex scenes to verify the performance of the method compared with the KITTI dataset.Experiments show that the framework is effective and works for different data sets.
Keywords/Search Tags:deep learning, 3D object detection, monocular vision, semantic segmentation, depth estimation
Related items