Font Size: a A A

3D Object Detection Algorithm Based On Point Cloud And Image Multi-stage Fusion

Posted on:2023-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:G P YangFull Text:PDF
GTID:2568306911485654Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
3D object detection is a key technology in autonomous driving environment perception,which can provide the basis for safe and reliable autonomous driving.Lidar and color(RGB)cameras are the two most important sensors on autonomous vehicles,acquiring point clouds and images,respectively.The point cloud contains accurate distance information,but it is sparse and disordered.When performing object detection,it is difficult to detect lowresolution or occluded objects,and there are cases of false detection and missed detection.The image contains RGB color values,and has detailed information such as texture and boundary.However,due to the characteristics of near-large and far-small during imaging,and the lack of distance information,the precise position of the object cannot be estimated in the 3D object detection task.In this paper,the point cloud and image fusion method is used to improve the accuracy of 3D object detection.The weighted feature point-by-point fusion method,the multi-source adaptive fusion method and the Transformer-based fusion method are used in different stages.The specific work is as follows:(1)Aiming at the problem of false detection caused by the inconsistency between object classification and localization in the fusion algorithm,a 3D object detection method based on point-by-point fusion of weighted features is proposed.The method first trains a twodimensional object detection model to obtain a multi-scale semantic feature map,and matches the semantic feature vector for each point in the point cloud space according to the mapping matrix between point cloud coordinates and pixel coordinates,and then uses the point feature vector and semantic the splicing vector of the feature vector estimates a weight vector,and uses the weight vector to filter more effective features to expand the point feature dimension,so as to obtain a fusion point set fused with multi-scale semantic features,and finally cut the point cloud space into several cylinders,And aggregate all fused point features in each cylinder into a global feature representation of the cylinder,that is,a bird’s-eye feature map,so as to regress the classification confidence and position pose of the target.Experiments show that the use of channel-weighted semantic features to expand the dimension of point features can effectively avoid feature matching errors caused by spatial position errors.Secondly,the proposed multi-scale fusion structure can avoid the failure of single-scale semantic features to meet multi-resolution The needs of the target for different ranges of information,in general,the fusion algorithm has a significant effect on correcting the classification and positioning deviations,and improves the accuracy of 3D object detection.(2)Aiming at the problem of low-resolution target misdetection in point cloud space,a 3D object detection method based on multi-source feature adaptive fusion is proposed.The method firstly uses 2D and 3D convolutional neural networks to encode images and voxels at multiple scales,and then designs a dense fusion structure to serialize and aggregate image and voxel features at different scales on point features.The object point probability map is estimated through the multi-scale feature map,and the probability results are projected to all points in the point cloud space using the mapping relationship to obtain the foreground point set,and then the voxel feature map is used to estimate the candidate area,and the candidate area The key points of,utilize columnar region pooling to form a unified candidate region feature representation to regress the classification confidence and position pose of the target.Experiments show that foreground point estimation and columnar region pooling based on probability mapping can effectively increase the proportion of key point information in detection tasks,so as to avoid information loss caused by too few representation points for low-resolution targets.Overall,the effective combination of the three kinds of information improves the detection accuracy of low-resolution objects.(3)Aiming at the missed detection problem caused by target occlusion in complex scenes in point cloud space,a 3D object detection method based on candidate region Transformer fusion is proposed.The method first uses the classic point cloud target detection network to obtain the candidate area,and performs position encoding on the point cloud and corresponding pixel information in the candidate frame.The spatial distance information and plane distance information are introduced,and then the multi-head encoding structure is used to expand the at last,the unknown key vector is used in the decoding structure to estimate the channel weight,and the encoded point vector is aggregated into a unified feature vector,that is,the global feature representation is used to estimate the class and predict the target box.Experiments show that the encoding structure based on attention mechanism can effectively extract the spatial structure features of local point sets,and secondly,the overall fusion structure has a significant effect on multi-target discrimination in complex scenes.
Keywords/Search Tags:3D object detection, Multi-sensor fusion, Lidar point cloud, Feature fusion, Attention mechanism, Deep learning
PDF Full Text Request
Related items