Font Size: a A A

Research On Object Detection Method Based On Deep Learning

Posted on:2021-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S LuFull Text:PDF
GTID:1368330605981254Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Object detection is a fundamental and challenging topic in the field of com-puter vision.Its main purpose is to identify all objects in the image and locate them.Based on a large number of domestic and foreign researches,this thesis explores the difficulties and challenges faced by current object detection tech-niques based on deep learning,and proposes solutions from the following three perspectives:the translation invariance of a convolutional neural network,the receptive field of a convolutional filter,high-resolution feature map and feature fusion.Concretely,the contributions are summarized as follows.This thesis proposes a position-sensitive grid convolutional neural net-work.The state-of-the-art object detection networks rely on convolutional neu-ral networks(CNN)pre-trained on a large auxiliary dataset designed for the image-level classification task,and then,the pre-trained CNN is refined on the object detection dataset.The image-level classification task prefers translation invariance of CNN-when moving an object inside an image,there should be no discrimination between them.The object detection task needs localization representations of CNN that are translation variant to an extent-translating an object inside a candidate box should be discriminative and indicate how well the candidate box overlaps the object.The position-sensitive grid convolutional neural network includes a grid convolutional layer and a grid pooling layer.The grid convolutional layer outputs a feature map that is sensitive to specific posi-tions of the object.The output cells of the grid pooling layer alternately come from different feature maps.The grid convolutional neural network can control the sensitivity of the object translation through the grid types,solving the prob-lem that the translation invariance of CNN designed for the image classification task is too strong.The experimental results show that the G-CNN can improve the object detection performance and accuracy.This thesis proposes a new module to adaptively determine the receptive field size of a convolutional filter,named receptive field adaptive convolution.The receptive field size of a convolutional filter in a deep convolutional net-work is a crucial issue for object detection task,as the output must respond to a suitable size of the area in the image to capture proper information.The re-ceptive field size of the convolutional filter is fixed due to the inherently fixed geometric structure of CNN.However,objects of interest vary significantly in size within the images for object detection,and the high-level convolutional filters encode semantic features over spatial positions.Therefore,the adaptive determination of the receptive field size of the convolutional filter is desirable for object detection.The receptive field adaptive convolution can adaptively determine the receptive field size of the convolutional filter.It is based on the idea of dilating the convolutional filter with multiple dilation values,calculat-ing the convolutional value separately and selecting the maximum value as the output.The experimental results show that receptive field adaptive convolution can adaptively change the receptive field according to the object scale to extract better feature maps and improve object detection accuracy.This thesis presents an object detection architecture using convolutional networks with high-resolution feature map fusion.Large scale variations across objects and small object detection are the main challenges for object detection.The state-of-the-art CNNs have large strides that lead to a very coarse represen-tation of the input image,which makes small object detection challenging.The high-resolution feature map fusion module can increase the resolution of the top feature map by a factor of 4 and fuses multi-level feature maps while keeping the input image size unchanged.Besides,this method adaptively recalibrates channel-wise feature responses by explicitly modelling the interdependencies between channels.The experimental results show that the high-resolution fea-ture maps extracted by this method can improve the accuracy of object detec-tion,especially for small-scale objects.
Keywords/Search Tags:Grid convolution, Receptive field adaptive convolution, High-resolution feature map, Deep learning, Object detection
PDF Full Text Request
Related items