Font Size: a A A

Target Region Extraction And Fine-Grain Image Classification Based On CNN

Posted on:2020-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z P DaiFull Text:PDF
GTID:2428330578458857Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Image classification is an important part of computing vision.From the beginning of development to the present,great progress has been made.In the large category classification,the correct rate of top-5 has exceeded 90%,exceeding the existing level of human beings.The performance of subclass classifications within the same large category has reached more than 80%,but has not yet reached the professional level,and these models tend to be specific to specific data sets and may not be extensive.The fine-grained classification algorithm at this stage is mainly divided into two steps.The first step is to find the area of the target object or the key part of the target object.The second step is to find the area for feature extraction as the input of the classifier.In the first step,the existing fine-grained algorithm uses artificially labeled information to perform region extraction on the input image,which is often expensive.In the second step,the existing fine-grained algorithm uses a single convolutional neural network for classification training,and a single convolutional neural network may not extract some features.In this paper,we use the weakly supervised region extraction algorithm to extract the target region,and then use CNN to extract the features of the target region for fine-grained classification.This article has two main tasks: The first is to use some of the characteristics of the existing network to reduce the impact of large noise(noise objects in the foreground or the image area occupied by the noise objects larger than the target object)on the image classification without excessive manual marking information.And on this basis,keep the important feature areas of the target object as much as possible,and generate a tailored image.The network model trained on the ImageNet dataset has been able to distinguish the large class categories very well.By merging the output features of the network pool5 layer(using VGG16 as an example),the network is identified by the key parts of the target object(Special parts,etc.).Based on this feature,the attention of the network can be realized,which helps to improve the performance of fine-grained classification.When merging the feature maps of other different sizes of the network,it is found that when there is no strong background(the noise object is too large or the target object is too small),the multiscale feature map fusion can perfectly fit the boundary of the target object;when there is a strong background The "attention" of the network is easily attracted by noise objects(people,trunks,etc.),causing the regional extraction network to misjudge the location.The second is to use the existing network to make some reasonable changes and combinations,and consider some important factors to achieve fine-grained image classification of the tailored image.The original data set and the cropped data set in Chapter 3 of this paper are used as the training data set to compare the accuracy of the network under the same experimental conditions.The networks used are divided into classic CNN networks and improved CNN networks.
Keywords/Search Tags:region extraction, attention, fine-grained classification, multi-scale, CNN
PDF Full Text Request
Related items