Font Size: a A A

3D Object Detection And Optimization Based On Depth Information

Posted on:2020-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Q ZhaoFull Text:PDF
GTID:2518306215954659Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Object detection is the most basic problem in the field of computer vision.Its core task is to determine the position and size of a specific object in an image by using a certain search strategy and object recognition algorithm for any given image.At present,the object detection technology based on 2d image has been developed and matured.Most of the research is based on 2d optical image acquired by RGB camera.The real world is a 3d space,when it comes to 3d fields such as autonomous driving,virtual reality,and robots,the 2d object detection cannot obtain the high-dimensional features of the scene due to the lack of depth information,that is,the object's location,size and orientation cannot be described in the real scene.At the same time,the growing depth and size of convolutional neural network brings great challenges to the power consumption and running speed of equipment.Therefore,it is especially important to combine the depth information to perform 3d object detection and optimize the network.It will also promote the development of artificial intelligence technology.This paper investigates the various methods in the relevant application fields for the3 d object detection problem,summarizes its advantages and disadvantages,then on this basis,starting from the improvement of a prior information utilization and model compression and acceleration,the main contents include the following parts:(1)Estimate the prior orientation of the 3d object proposal.Aiming at the problem that the prior orientation of the 3d object proposal obtained by the existing method is a unified preset value,this paper proposes a new method for extracting the a prior orientation of the3 d object proposal based on 2d detection and segmentation information.Firstly,the 2d object detection box and segmentation instance is obtained through the color information and the depth information,and the key points of calculating the prior orientation are extracted from the segmentation instance;then the uncertain points are excluded and the false points are corrected according to the optimization process of the key points;finally,the point cloud reconstruction is performed by the depth information to obtain the 3d coordinates of the key points to determine the prior orientation of the 3d object proposal.(2)The 3d object proposal is extracted by combining the vanishing points and the prior orientation.Aiming at the problem that the existing method relies on the template information to extract the 3d object proposal,this paper calculates three mutually orthogonal vanishing points by the prior orientation through the Euler angles and perspective projection principle in 3d coordinate system,then projected to the pixel coordinate system;and the top edge of the 2d proposal is sampled with the preset sampling interval to obtain the first vertex of 3d object proposal in the pixel coordinate system;finally,the remaining seven vertices of the 3d object proposal are calculated by the linear relationship between the three vanishing points and the vertice,and are converted into the 3d coordinate system to obtain the complete 3d object proposal.(3)3D object detection based on 2.5d information.The 3d object detection method based on 3d information has problems of large amount of calculation and low information utilization,this paper uses 2.5d information for 3d object detection,that is,using RGB-D camera to obtain color information and depth information,and the color feature and depth feature are calculated by neural network respectively,then extracts features from the ROI pooling layer through the 2d object detection box and the context information and performs channel fusion.Finally,the 3d object proposal is classified and refined according to the merged features.(4)Model compression and acceleration.Aiming at the problem of large model,large amount of calculation and slow running speed in 3d object detection,a “lossless” method is proposed to merge the Batch Norm layer and the Scale layer into the adjacent Convolution layer without loss of precision.At the same time,the “lossy” method was combined,that is,the neural network was further compressed by channel pruning method,and representative neurons of a specific layer were selected by LASSO regression,finally the original output of the layer was reconstructed by neurons.
Keywords/Search Tags:3D object detection, vanishing point, depth information, prior orientation, compression and acceleration
PDF Full Text Request
Related items