| 3D object detection is one of the important tasks of environment perception in autonomous vehicles.Due to the lack of precise depth information,3D object detection based on monocular images is not effective.As a common detection and ranging sensor,Lidar has been widely used in the field of autonomous driving.Limited by the inherent sparsity of lidar point clouds,single frame-based 3D object detectors have poor ability on objects with severe occlusion and long distances.Sequential-frame based algorithms can effectively improve the object detection performance through effective spatiotemporal information fusion.This paper focuses on 3D object detection based on time series point clouds.There are two motions in the scene of autonomous driving,one is ego motion,another is object motion.The positions of an object in different frames need to be aligned for feature enhancement.In this paper,we propose a multi-frame detector SFA-Det,using estimation of scene flow to guide feature warping and alignment.New solutions are proposed in terms of scene flow estimation and multi-frame data augmentation,mainly including introducing dilated convolution to improve scene flow estimation performance and retaining interframe motions during data augmentation.Experimental results in self-collected 40-line lidar dataset,the KITTI RAW dataset and the nu Scenes dataset show that SFA-Det has high detection precision.Aiming at the problems of high computational complexity and difficulty in realtime operation of multi-frame feature alignment scheme guided by scene flow,We proposes a sequential frame detection and prediction network PAD-Net,based on prospect reinforcement knowledge distillation.Through knowledge distillation,the complete features of teacher network extracted by the foreground pre-aligned point cloud are used to guide the student network to extract the features of the original point cloud,and the feature alignment operation is implicitly realized.To improve the implicit alignment ability of student networks,we introduce deformable convolution to student network.In addition,we introduce a scene flow estimation auxiliary task to extract rich motion feature,which can further improve the detection quality and implement the object motion prediction task.Experimental results show that the proposed PAD-Net structure is simple,the inference speed is as high as 28 FPS,and the performance of SFADet can be obtained under high real-time conditions. |