| 3D object detection is the core technology of the automatic driving perception system.Its main task is to detect other objects in the 3D environment around the vehicle,so as to provide a judgment basis for the auto drive system.The 3D object detection algorithm based on point cloud data as input has developed rapidly because it balances the relationship between accuracy and model complexity.However,due to the high complexity of point cloud data,current 3D object detection algorithms still have problems such as insufficient ability of backbone network to depict complex data,and a single scale of feature maps used for boundary box regression,resulting in the need to improve the accuracy of boundary box regression.In addition,due to the significant differences in point cloud characteristics of similar point cloud targets at different distances and in different scenarios,it is difficult to provide sufficient information only based on the ground truth monitoring method,resulting in the model not being able to better learn the differences between similar targets.In view of the above problems,this paper proposes two methods to try to solve,and the specific research contents are as follows:(1)A 3D object detection algorithm(PVFormer)based on window partition Transformer is proposed in the first work,which enhances the feature extraction ability of the backbone network and makes the boundary box regression more accurate.The detailed flow of the model is as follows.First,use self-attention-based Transformer as the core of the backbone network,where Transformer can better identify complex input data and extract more semantic features.Through the top-down feature fusion method,the corresponding targets are detected on the feature maps of different scales,so as to generate high-quality boundary frames.The window partitioning mechanism ensures that Transformer has sufficient sensitivity field in the case of reasonable computation.Secondly,the background points are filtered before downsampling the point cloud,which improves the probability of sampling the target point cloud as the key point and provides more abundant key point features for the quadratic regression of the boundary frame.Finally,the loss function of DIoU with directional sensing is used to constrain the regression of boundary frame,which improves the quality of boundary frame regression.A series of experiments on the KITTI dataset showed that PVFormer’s mAP index was 1.66%,3.32% and 2.58% higher in the car,pedestrian and cyclist categories than PV-RCNN++,respectively.(2)In the second work,a 3D object detection algorithm based on self-integrated learning(SE-PVFormer)is proposed,which makes the model learn the different features of the same object more effectively.The algorithm constructs a pair of teacher models and student models with the same structure,both using pre trained PVFormer.In the training,the predicted results of the teacher model and the ground truth are used to jointly supervise the training of the student model.The predicted results of the teacher model contain richer information,which is conducive to guide the students to learn the differences between the same kind of goals.In order to further enhance the generalization performance of the model,the shape perception data enhancement mechanism is used to encourage students to learn the features of the model targets in different states,and the space perception data enhancement mechanism is used to make additional training targets more reasonably inserted into the training data.Experiments on KITTI data set show that SE-PVFormer’s mAP indexes in the categories of car,pedestrian and cyclist are further improved compared with PVFormer,which are 1.01%,2.05% and 1.66%,respectively,achieving excellent results. |