Font Size: a A A

Real-time Object Detection Based On Cascaded Neural Network

Posted on:2020-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Z MaFull Text:PDF
GTID:2428330596982428Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recently years,with the development of technologies in computer vision and deep learning,numerous impressive methods are proposed for accurate 2D object detection.However,beyond getting 2D bounding box or pixel masks,3D object detection is eagerly in demand in many applications such as autonomous driving and robotic applications because it can describe objects in a more realistic way.Because LiDAR provide reliable depth information that can be used to accurately localize objects and characterize their shapes,many approaches use LiDAR point cloud as their input,and get impressive detection results in autonomous driving scenarios.In contrast,some other studies are devoted to replace the LiDAR with cheaper cameras,which are readily available in daily life.As LiDAR is much more expensive and inspired by the remarkable progress in image-based depth prediction techniques,this paper focuses on the high performance detection of 3D object utilizing only monocular images.In this paper,we propose a monocular 3D detection framework in the domain of autonomous driving.Unlike previous image-based methods which focus on RGB features extracted from 2D images,our method solves this problem in the reconstructed 3D space in order to exploit 3D context explicitly.To this end,we first leverage a standalone module to transform the input data from 3D image plane to 3D point cloud space for a better representation,then we perform the 3D detection using PointNet backbone net to obtain objects' 3D locations,dimensions and orientations.To enhance the discriminative capability of point clouds,we also propose a multi-modal features fusion module to embed the complementary RGB cue into the generated point cloud representation.We argue that it is more effective to infer 3D bounding boxes from the generated 3D scene space(i.e.X,Y,X space)compared to the image plane(i.e.R,G,B image plane).Evaluation on the challenging KITTI dataset shows they our approach boosts the performance of state-of-the-art monocular approach by a large margin,i.e.,around 15% absolute AP on both 3D localization and detection tasks for Car category at 0.7 IoU threshold.
Keywords/Search Tags:3D object detection, outdoor scene, autonomous driving, data representation
PDF Full Text Request
Related items