Font Size: a A A

Research On 3D Object Detection Algorithm Based On Deep Learning

Posted on:2022-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518306527483084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
3D object detection has attracted much attention in the field of computer vision in recent years.It has a wide range of application prospects in robotics,autonomous driving,augmented reality and virtual reality.Therefore,how to accurately detect 3D objects has great research significance and practical application value.In order to effectively avoid the drawbacks of traditional methods,give full play to the powerful feature learning capabilities of deep learning,this paper studies a 3D object detection algorithm based on deep learning,fuses data from different sources,builds a multi-modal feature fusion framework to make up for the lack of semantic information of single-modal point clouds,so as to improve the detection performance of far and occluded objects.The main work and research results of this paper are as follows:(1)In order to solve the problem of insufficient semantic information of single-modal point cloud data and poor object detection performance caused by point cloud sparseness,this paper proposes a dual-attention mechanism for multi-modal data fusion 3D object detection network.First,the image feature extraction branch is designed.The multi-layer image semantic information effectively retains the structure and semantic information of the object.Then,a multi-neighborhood context information extractor for voxels is designed.It expands the receptive field of voxels and fuse multiple context information of voxels which improve the ability of voxel features to represent the spatial structure and semantic information of objects,and improve feature robustness.Finally,a multi-modal feature fusion module is designed,which uses channel attention to fuse different modal features.Voxel attention enhances the feature expression of effective target objects and suppresses the feature expression of useless background objects.The experimental results on the KITTI data set show that this paper has effectively improved the detection performance compared with the benchmark algorithm Voxel Net.At the same time,compared with the existing mainstream single-modal methods and multi-modal methods,this algorithm has achieved a greater performance improvement.(2)In order to solve the defect that the traditional residual network is not suitable for 3D object detection task,this paper further improves the image branching module of DAMFNet algorithm.A dilated residual module that is more suitable for 3D target detection task is designed.While dilated convolution is used to extract multi-layer semantic features of images,it effectively retains the structural details of far and small objects in low-resolution feature maps.Full semantic feature module is designed in which each feature map is further enhanced by semantic information from all subsequent feature maps.The experiments on the KITTI data set show that compared with the benchmark algorithm DAMFNet,this paper achieved an effective improvement in the detection performance of far and hard objects.At the same time,compared with many mainstream multi-modal detection methods,this paper has made a great improvement in the detection performance of far and hard objects.(3)This paper proposes a general and robust voxel feature encoder based on traditional Transformer.It solves the defect that the voxel feature extractor based on the Point Net method ignores the spatial relationship and context exchange between points,and cannot adaptively extract robust voxel features.Firstly,the invariance of self-attention network to sequence data is explored,and it is applied to point cloud data processing.Secondly,a voxel feature layer is constructed based on the Transformer.The voxel feature layer adaptively learns the local and robust context of the voxel according to the spatial relationship and context information exchange between all points in the voxel.Finally,a general 3D object detection framework with the voxel feature layer as the core is constructed.In particular,the voxel feature layer can be embedded in any other 3D target detection framework based on voxel methods.The experimental results on the KITTI data set show that this method achieves excellent performance in 3D target detection.
Keywords/Search Tags:3D object detection, adaptive fusion, multi-modal data fusion, attention mechanism, multi-neighborhood features, self-attention network, Transformer, point cloud, codec
PDF Full Text Request
Related items