3D object detection is a basic task in the fields of autonomous driving,intelligent security and night-assisted driving,which can assist robots,electric vehicles and other artificial intelligence products to better understand the surrounding natural environment and bring them a more accurate basis for decision making.The traditional 3D target detection algorithm relies on human manual design to extract features,the detection accuracy is low and there are limitations in the scope of application,while the rapid development of deep learning technology has provided a new research idea and direction for 3D target detection,based on multimodal 3D target detection algorithm can extract more robust features,the overall detection level has been greatly improved.Therefore,the research on the improvement and optimization of the 3D target detection method based on deep learning for point cloud and RGB images,which is of positive significance to improve the 3D target detection accuracy and model generalization ability.The specific research work in this thesis is as follows:(1)To solve the problems of low signal-to-noise ratio and sparsity of point clouds in RGB images and point cloud data,we studied the related image and point cloud denoising and point cloud complementation algorithms.Firstly,a filtering algorithm is used to denoise the RGB images as well as the point cloud data for preprocessing.Secondly,a two-stage point cloud RGB sequence fusion algorithm(DBRF)is proposed for fine-grained complementation of the point cloud.And finally,different semantic segmentation network strategies are used to demonstrate that their segmentation fraction vectors effectively improve the 3D target detection accuracy.(2)In response to existing 3D target detection methods that only consider highfrequency information extraction such as convolution and pooling as well as ignore other frequency features and structured relationships.How to effectively employ the multi-frequency information between point columns is the key to improving the performance of 3D target detection,so the multi-frequency collaborative network(MFC)is proposed for the 3D target detection task.First,the multi-frequency feature aggregate sub-module based on discrete cosine transform is used to solve the dense multi-target detection problem by refining different frequency feature maps.Secondly,a frequency response sub-module based on learnable variable symbols is introduced to select the frequency with better response from multiple frequencies to improve the important features of the detected targets and suppress the interference from the surrounding environment.Finally,a single-stage 3D target detection algorithm,DPRF-MFC,is proposed to perform 3D target detection by embedding MFC.(3)Based on the above research,a multimodal 3D target detection algorithm based on decision-level fusion is proposed,and the fusion of RGB images with LIDAR point clouds can achieve complementary information of different modalities.Firstly,for the problem of accuracy enhancement limitation caused by the characteristics of point clouds and RGB images data,the fusion detection research of point clouds and RGB images is considered,so the decision-level fusion in late fusion is used,and the 3D target detection algorithm based on decision-level fusion is proposed.Secondly,for the case that the strategy of decision-level fusion requires the input 2D and 3D target detectors to skip the NMS stage to obtain a large number of detection frames,the algorithm in Chapter 5 is introduced Res2 Net to improve the feature extraction network and enhance the recall of the detection network when the detection network gains multi-scale feature learning capability and reduces the threshold.Finally,the geometric consistency-based method using Io U only considers the overlapping area of 2D and 3D detection frames as a judgment condition,when the recall rate of the 3D detection network is high,there is the phenomenon of deviation when projecting to the image plane,which will produce the problem of false detection,so the geometric consistency-based Io U is improved,and then CIo U is used as the basis for judging positive samples,and it is used as a further criterion for judging the true-positive of detection frames.The experiments in this thesis show that the improved 3D target detection algorithm effectively improves the 3D detection accuracy in all scenes for three classes of targets,Car,Pedestrian,and Cyclist.The decision-level fusion of point cloud and RGB images further improves detection accuracy. |