Font Size: a A A

A Study Of Feature Map Enlargement Methods For Object Detection Using CNN

Posted on:2020-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Islam Mohammad KhairulFull Text:PDF
GTID:2428330590961612Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The main goal of this paper is to compare and find the best conceivable combination of feature enlargement methods for object detection with the esteem of speed and accuracy that uses convolutional neural networks.The role of convolutional neural networks is to reduce the images into a form which is easier to process,without losing features which are critical for getting a good prediction.CNN were chosen because it has the ability to develop an internal representation of a two dimensional image.This allows the model to learn position and scale in variant structures in the data,which is important when working with images.There are lots of deep learning frameworks out there,and new frameworks are popping up frequently to address specific niches.The basis of choosing deep learning frameworks is that it allows us to build deep learning models more easily and quickly without getting into the details of underlying algorithms.Besides,it provides a clear and concise way for defining models using a collection of pre-built and optimized components which is capable for optimizing the performance,parallelize the processes to reduce computations and automatically compute gradients.Moreover,these retrospective algorithms are chosen because all the models are performed for image feature extraction by using CNN.In recent years,object detection under computer vision discipline has been evolving more and more.Object detection is a computer technology which is an integral part of computer vision and image processing.Object detection is a covenant with detecting instances of semantic objects of a particular class(such as humans,buildings,cars,trees,cycles,etc.)from digital images and videos.In modern convolutional object detection systems,there are numerous ways to trade accuracy for speed and memory usage.Among different object detectors,it is very difficult to a fair comparison.In recent years,a number of different successful developed systems has been proposed but fair comparisons are difficult due to various aspects such as different types of feature extractors(e.g.VGG,Residual networks),fixed image resolutions and as well as different development environments(hardware and software).In this paper,we focus on three feature extractor upsampling models such as bilinear interpolation,nearest neighbor interpolation and pixel shuffle interpolation and find the best comparable results based on CNN analysis which provides some guidance for the performance of feature extraction methods in object detection.In addition,the primary task of object detection algorithms is to seek out the object of interest by drawing a bounding box.Besides,in an object detection case,it is not mandatory to draw only one bounding box.Several bounding boxes might signify multiple objects of interest within one image.Another fundamental problem lies in object detection is to detect multiscale objects in images.Scale-transfer module was used to balance the conflict between resolution and semantic.Besides,to obtain high-resolution feature maps for detecting objects and to obtain feature maps with the large receptive field to detect large objects,scale-transfer layer and pooling layer uses respectively.Moreover,there were also some detection problems on a shallow feature map.Because to distinguish between the background and foreground of small objects,it needs more semantic,but shallow semantic is not enough.To predict object bounds and objectness scores simultaneously at each position,a fully convolutional network was used which is called Region Proposal Network(RPN).Region proposal algorithms uses to hypothesize object locations and share a full image convolutional features with the detection network.In accordance with the continuous progress of object detection and semantic segmentation development,instance segmentation arises with some new problems and adding a new branch for predicting an object mask in parallel with the existing branch for bounding box regression.In recent years,different types of object detection models developed such as R-CNN,Fast R-CNN,Faster R-CNN,SSD,YOLO,R-FCN,FPN,STDN and Mask R-CNN.Feature Pyramid Network(FPN)is a feature extractor designed for such pyramid concept with accuracy and speed in mind.It replaces the feature extractor of detectors such as Faster R-CNN and generates multiple feature map layers(multi-scale feature maps)for the better quality information than the regular feature pyramid of object detection.Based on the FPN as an extension of Faster R-CNN,we perform the detection models such as bilinear interpolation,nearest neighbor interpolation and pixel shuffle interpolation to compare the different constraints on average precision(AP)for bounding box.We also measure the average recall(AR)for all of these models.During the experiment time,we reduce the epochs time from 90 k to 60 k for faster training dataset.All the models are fixed with IoU > 0.5 and maximum number of objects per image set to 100.We evaluate all the models and throughout all the models,pixel shuffle interpolation improves the result(0.215 AP)for small feature objects.We also show the experimental results on COCO 2014 minval dataset and discuss the comparison of the experiment results.
Keywords/Search Tags:Object Detection, Feature Maps, Bounding Box
PDF Full Text Request
Related items