Font Size: a A A

Three-Dimensional Object Detection Based On Deep Learning

Posted on:2024-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:X X HuangFull Text:PDF
GTID:2568307133951719Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Three-dimensional object detection is one of the key technologies in machine perception,which is used to interpret the surrounding environment and detect the threedimensional information of surrounding objects.It is widely used in practical applications such as autonomous driving,smart cities,and intelligent robots.However,in threedimensional object detection,the loss of point cloud feature information often leads to a decline in detection performance.To address this,a method is proposed to input multiscale features into the detection head of the three-dimensional object detection network to improve detection performance.Firstly,by analyzing the methods and frameworks of three-dimensional object detection at home and abroad,it is established that the input of this paper’s threedimensional object detection network is grid-structured point cloud data.In subsequent research on point cloud object detection,it was found that existing feature extraction methods often lead to the problem of missing point cloud feature information.Therefore,a method is proposed to strengthen global and local feature extraction using attention mechanisms and graph convolutions,and further analysis is conducted on the direction of improvement.Secondly,in the improvement of attention mechanisms,a Transformer that performs well in global feature extraction is selected for improvement,and the improved network is named DRPT.A Transformer network applied to point clouds is created,which uses self-attention mechanisms to establish correlations between point cloud data.Then,normalization is performed using double-stochastic matrices to enhance the extraction of global features.To verify its superiority,experiments were conducted on the Model Net40 and Shape Net Part datasets,and compared to the baseline network,DRPT improved detection accuracy by 5.6% and 5.5%,respectively.The graph convolution that enhances local feature extraction is also improved,and the improved network is named 3DGGCN.This network first inserts a grid query module,which can further improve the accuracy and stability of local features while preserving local information.Deformable convolution kernels are then introduced,which can generate changes in convolution kernels based on the number of point clouds,further enhancing the feature extraction ability of point clouds.To verify the superior ability of the improved model in processing large point cloud scenes,experiments were conducted on the Semantickitti and Semantic3 D datasets,and compared to the baseline network,3DGGCN improved accuracy by 3.9% and 6.2%,respectively.The improved model showed significant improvements in all aspects compared to the baseline model.Finally,the improvements in attention mechanisms and graph convolutions are combined to form a feature enhancement layer,and combined with the Point Pillars network to achieve three-dimensional object detection.The improved network is named TG-Pillars.This network uses DRPT and 3DGGCN to respectively extract global and local features of point clouds,and fuses the two types of features with multi-scale feature fusion to solve the problem of lack of geometric features in three-dimensional object detection networks.TG-Pillars was validated on the KITTI dataset,and the model improved vehicle-level accuracy by 2.16%,pedestrian-level accuracy by 3.84%,and bicycle-level accuracy by 2.09%.In subsequent field applications,the model was applied to real-time laser point cloud object detection in ROS.This means that the model has broad application prospects in various practical applications such as autonomous driving,smart cities,and intelligent robots.
Keywords/Search Tags:3D object detection, attention mechanism, Transformer, graph convolution, feature fusion
PDF Full Text Request
Related items