Font Size: a A A

The Research On Key Technologies Of Object Detection Based On Deep Convolutional Neural Networks

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:K Y PengFull Text:PDF
GTID:2428330647451055Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important foundation in the fields of computer vision and image understanding,object detection plays a key role in the area of autonomous driving,military object recognition,and intelligent medical care.It is vital to extract the high-level semantic features of images from massive data and to give the categories and locations of the objects of concern.However,the data in different scenarios have different characteristics,such as the image in the general object detection involves a variety of objects,3D object detection needs to be combined with point clouds.In view of the above scenarios,this paper started working on the detection sub-networks to optimize the effect of horizontal bounding box detection of the two-dimensional image.then removes the horizontal box constraint,studied the rotated box detection of remote sensing image object detection from the perspective of abstract feature extraction and anchor setting.Finally,extending the data space from two-dimensional to three-dimensional,sparse convolution is used to improve detection efficiency,and sets loss function to improve detection accuracy.Specifically,the main work of this article is as follows:Firstly,aiming at solving the problem of multi-layer feature fusion not making full use of the information of each layer,it is proposed that a two-stage image object detection method based on multi-detection sub-networks and regression fusion,and the image feature is further refined with the non-local attention mechanism.First,because the number of regions of interest is too large,it will significantly increase the computational cost of feature extraction in the later stage.the multi-detection sub-networks method that integrates information of location regression for each feature block directly can improve computational efficiency only with sacrificing a little accuracy.Second,in order to make full use of the characteristics of the area of interest,the pooled dimensions of the region of interest are studied.Third,the non-local attention mechanism for the high-level semantic layer is essentially a spatial attention mechanism,which can provide rich information for the subsequent layer.Experiments demonstrated that the method of this paper is efficient and effective.Secondly,for the problems of large scale variation and narrow objects in ship detection in remote sensing images,a deep high-resolution feature extraction method and anchors setting strategy are proposed.The feature extraction network is the basis of the detection model,and the excellent deep semantic information is obtained by fully integrating the feature maps of each layer.Moreover,The receptive field is very important for object detection.By using dilated convolution to improve the receptive field without reducing the resolution,the feature network can adapt better to objects of different sizes.Due to the large or small aspect ratio of the ship,some anchors used in general object detection cannot be used.In addition,there are various orientations for ship objects.Therefore,it is necessary to adjust the aspect ratio of the anchor and perform more dense sampling on the orientation of the anchors.Experiments show that the method in this paper can obtain better accuracy.Experiments showed that the method in this paper can obtain better accuracy.Finally,in the field of 3D object detection based on point cloud data,a sparse embedded convolution network is constructed.Due to the difference in the number of point clouds,the point cloud space needs to be divided.The divided unit is called a voxel,and the voxel feature encoder generates the same-dimensional feature representation of the voxel.Then,the output region is limited by the sparse convolutional middle extractor,and only the activated input region is output so that the calculation time of the convolution can be reduced and the efficiency can be improved.In the end,the output of the sparse convolutional middle extractor is used as the input of the region proposal network,and 1 × 1 convolutions are used to predict the category,location,and direction.In the loss function,we introduce the focal loss and the sine-error loss to improve the classification performance.
Keywords/Search Tags:Object Detection, Feature Fusion, Attention Mechanism, Dilated Convolution, Sparse Embbeded Convolution
PDF Full Text Request
Related items