| In the 14th Five-Year Plan,our country has clearly put forward the focus on the Internet of Vehicles and new-energy automobile which should be a national strategic emerging industry,autonomous driving technology have also developed rapidly.The resulting tragedy has seriously hindered the development and faced many technical challenges.In the complex domestic road conditions,obstacles,rainy weather and other bad weather greatly affect the quality of receiver data,and even lead to data loss,It greatly affects the accuracy of target detection in the process of autonomous driving.Therefore,this paper proposes a new multi-modal end-to-end network for image fusion point clouds,which simultaneously extracts the features of images and point clouds through multi-layer perceptrons.The sparse point cloud is complemented by the sensing ability of the graph convolutional neural network on the geometric structure,which improves the detection efficiency and accuracy.The research contents and main contributions of this paper are as follows:First of all,a bounding box detection algorithm based on voting network is proposed.In the past,the encoding method of point cloud data has a large information loss.In this paper,the original point cloud data has the advantages of complete geometric shape information and data robustness.The cloud data is used as the input,the features of the point cloud are extracted through the neural network,and these features are matched with the scene points,and the aggregation of the voting module is realized through the convolution of the shared weight.The voting module uses deep network parameters to learn more effectively and accurately than traditional methods,and can effectively integrate contextual features.Using the feature aggregation of cluster voting points,noise or low-quality points can be eliminated by voting,which effectively improves the detection frame regression accuracy.Secondly,in view of the problems that ordinary single sensors are susceptible to interference and small targets are difficult to detect,our work design a multi-modal fusion network,which not only uses a single sensor information,but also obtains image modal data and point cloud modal data at the same time.In the case of known camera calibration,the features of the two modalities are combined by using the corresponding relationship between the pixel points of the picture and the point cloud,so that the detector model is more accurate and robust.Thirdly,in response to the problem of low resolution of 3D sensors,our work innovatively proposed a module to perceive the geometric information of point cloud to complete the residual defect cloud.In the perception module,the graph convolutional neural network The points are constructed as a huge graph,and the model learns the global geometric features of the point cloud through the connection relationship between the graph nodes,so that the "missing" parts of the point cloud can be predicted and completed.In order to further increase the detection accuracy,our work propose a local optimization method based on Transformer.In the complement network structure,our work add a layer of contour point extraction to convert the coordinates of the point cloud into a sequence and convert it into a sequence-to-sequence process with Transformer.In the training process,a multi-task learning strategy is adopted,so that the model updates the parameters of contour points and global completion at the same time.When the model returns the loss,it not only returns the input and output loss,but also integrates the point cloud extraction.The loss of the contour points effectively improves the resolution of the point cloud.Finally,our work have proved the effectiveness of the method proposed in this paper through multiple sets of experiments,and achieved good results on the test data set. |