Font Size: a A A

Enhancing Deep Learning Based Object Detection

Posted on:2022-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Tanvir AhmadFull Text:PDF
GTID:1488306338998289Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Object detection is a fundamental and important research direction in the field of computer vision.Its main purpose is to identify all the objects in the images and locate them.As the cornerstone of image understanding and computer vision,object detection has become the basic foundation solution for higher-level complex visual tasks.In recent years,with the rapid development of deep learning,huge breakthroughs have been made in object detection.At present,object detection has been widely adopted in many fields,such as image segmentation,scene understanding,object tracking,image description,event detection,autonomous driving,intelligent monitoring,and medical image analysis are a few to name.However,a major breakthrough has been made in the field of object detection but still,the existing algorithms suffer from low accuracy,especially in complex scenes and to detect small objects.In real-life scenarios,the objects may be appearing with scale variation,accompanied by dramatic changes in light conditions,occlusion,which further makes object detection very challenging.In this dissertation,a series of innovative researches on object detection in complex scenes are carried out.Based on the existing object detection algorithms,a variety of novel methods are proposed to improve the performance of object detection.The main contributions of this dissertation are summarized as follows:Aiming at the problem of low accuracy and to extract fine features of objects,this dissertation proposes a feature fusion method based on a deep convolutional neural network to increase the accuracy of object detection,by introducing deconvolutional layers and adopting a multi-scale fully feature fusing strategy.In order to extract the feature of different layers to enhance the feature maps for object detection at the predication layer.After obtaining feature maps of different scales,the method simultaneously transfers fine details from the shallow layer to the deep layers.This way from the shallow and deep layers semantic features are transferred,the newly generated feature map contains both finely detailed features and rich semantic features information of objects.This strategy preserves and avoids the loss of receptive fields for objects especially for small objects,which makes the model efficient to detect small objects with high confidence,solving the problem of small object detection in the current object detectors as well.Extensive experimental results show that the proposed method improves the performance of object detection,especially for small objects,by fusing the features from different layers.To solve the problem of low recall,localization error and to extract and learn better features,especially when there are small and large objects present in an image.This dissertation proposes a novel method,in the proposed method,an inception model with a convolutional layer 1×1,with spatial pyramid pooling layer and a mean square loss is constructed,the loss function guides the optimization of the class to which the object belongs and optimizes the position of the boundary box for detecting the object.The inception model helps to deepen and widen the network,as the convolutional kernel of different scales is connected in parallel,thus,more effective multiscale features are extracted.The convolutional neural network needs a fixed size of input image,to overcome this,a spatial pyramid pooling layer is added,which is able to output a fixed-size image for any size input or any ratio of the input image,and extract pool features at varying scales.These spatial bins have sizes proportional to the image size,so the number of bins is fixed regardless of the image size,which makes not only improves the network performance but also dramatically reduces the required computation time by avoiding repeatedly computing the convolutional features.Experimental results show that the proposed method can greatly reduce the missing rate of objects and makes less error in recognition and detection.Aiming at the problem of low accuracy and to extract discriminative features of objects,this dissertation proposes a discriminative feature learning technique.In the proposed method triplet embedding is constructed with the deep convolutional neural network,implemented in an end-to-end manner to enhance the accuracy and strengthen the classification of positive vs negative in various object classes in complex scenarios.The proposed framework learns mapping from images to compact Euclidean distance,where the distance directly corresponds to a measure of object similarity.In this way mitigates the intra-class and inter-class variations lead to highly accurate object detection with a low computational cost.Extensive experiment proves the effectiveness of the proposed method,in addition,the proposed method solves the problem of the small object as well elegantly.
Keywords/Search Tags:deep learning, object detection, complex scene, feature fusion, triplet embedding
PDF Full Text Request
Related items