Font Size: a A A

Research On 3D Object Detection Based On 2D Information

Posted on:2022-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhangFull Text:PDF
GTID:2518306515466904Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with social development and technological progress,2D image object detection has made great progress in many aspects.However,in many applications of autonomous driving or augmented reality(AR),in addition to obtaining 2D bounding boxes,3D understanding is also required.With the popularity of 3D sensors on mobile devices and autonomous vehicles,a lot of 3D data is acquired and processed.Therefore,the understanding of 3D has become extremely important.This paper mainly studies three-dimensional object detection,which is to classify object categories and estimate the three-dimensional bounding box of physical objects.At present,one of the common methods for 3D object detection is based on two-dimensional drive implementation,which has great limitations in actual scene detection,such as lighting,occlusion,single information expression and other issues.In order to solve the problem of point feature loss caused by uneven point cloud density,insufficient accuracy of point cloud feature extraction,and inaccurate bounding box prediction,this paper studies the network framework of point cloud segmentation,the network framework of multi-view fusion,and the network framework that adds attention mechanism to the feature extraction process.The main research contents are as follows:1?Aiming at the problem that the uniformly sampled point cloud is used for training in the three-dimensional object detection of RGB-D data and the accuracy of the point cloud in the actual scene is reduced,this paper proposes to use the Point Conv network for detection to solve the problem of the loss of some key point cloud features.The method proposed in this paper fully considers the invariance of the input points,and the neural network that directly calculates the point cloud is more conducive to point cloud segmentation.At the same time,it is considered that Point Net lacks the extraction of local features,lacks the ability to recognize fine-grained patterns and apply to complex scenes,and does not consider density information when sampling.The point cloud in the real scene is often different in density.The Point Conv network increases the weight of the dense sparse area so that the key points of the sparse area are not lost,so that the point cloud feature output is more accurate.Therefore,the Point Conv network is used for 3D Instance segmentation and 3D bounding box evaluation can get more accurate results.2?In view of the influence of the serial structure,the result of 3D frame estimation depends heavily on 2D detection.This paper proposes to fuse the RGB image features extracted by Res Net and the point cloud features extracted by the Point Net network,and apply them to the 3D instance segmentation module.Through the above methods,the accuracy of 3D object segmentation and 3D bounding box evaluation can be improved,and solve the problem of inaccuracy of 3D point cloud data that may be missed in 2D detection.The fusion network will use the image features extracted by the Res Net network and the corresponding point cloud features generated by the point cloud through the Point Net network as input,and then merge these features.The process of fusing features relatively improves the accuracy of features and supplements certain features,which largely solves the problem that the results of 3D frame estimation depend heavily on 2D detection.3?Aiming at the problem of inaccurate feature extraction caused by the illumination and occlusion of the input point cloud data,and the maximum pooling method destroys the information structure of the point cloud,leading to the problem of weak local feature expression ability,this paper proposes a 3D object detection method based a convolutional attention mechanism(Convolutional Attention Mechanism,CAM).CAM firstly adds an attention mechanism to the first and last layers of the traditional feature extraction network structure,then merges the feature information of different layers,and finally performs a normalization operation.While CAM realizes the fusion of local and global information,it significantly improves the accuracy of object detection in illuminated and occluded scenes.
Keywords/Search Tags:RGB-D data, three-dimensional, object detection, point cloud, PointConv network
PDF Full Text Request
Related items