Font Size: a A A

Fine-grained Visual Classification Based On Convolutional Neural Network

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:T Q DengFull Text:PDF
GTID:2428330620456143Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the recent years,with the continuous development of deep learning technology,the research of Fine-grained Visual Classification(FGVC)has made great progress.FGVC is relative to coarse-grained visual classification,it refers to the fine-grained classification of subclasses in a large class of images.At present,the FGVC based on weakly supervised annotation is becoming an important topic.The weakly supervised annotation only uses the image-level label information,and does not require local annotation information.Fine-grained images have small variances between same classes and big variances within the different classes,so the FGVC is more difficult than the coarse-grained visual classification.However,there are lots of problems including insufficient feature extraction,insufficient feature utilization,and difficulty in mining the most representative regions in the research of FGVC.The thesis will solve the above problems based on cross bilinear vector combined with strategies such as Enselmble Learning and Reinforcement Learning.The main works and innovations of the thesis are as follows:1.Aiming at the problem of insufficient feature extraction and insufficient feature utilization of fine-grained image,the Multi-stream Multi-scale Cross Bilinear CNN has been proposed.The method utilizes the cross bilinear vector of the multi-stream network to extract more detailed local features of the image,which solves the problem of insufficient feature extraction.The problem of insufficient feature utilization is solved by using image random mixup enhancement and fusion of multi-scale bottom bilinear vector.The experiments prove that the method reached the state of the art accuracy level than the prior methods on the three public fine-grained datasets including CUB-200-2011,Stanford Cars and Aircraft respectively.2.Considering the problem that the discriminative ability of cross bilinear vector extracted by multi-stream in the above method is different.The Multi-stream Multi-scale Cross Bilinear CNN based on Bagging has been proposed using ensemble learning bagging strategy.The method constructs several base classifiers by using the cross bilinear vector extracted by the multi-stream networks,and adopts the majority voting or weighted voting strategies to integrate the base classifiers to predict the fine-grained categories of images.The method uses the structure of the skip connection to fuse the bottom and shallow features of the image,which solves the problem of insufficient utilization of image features further.The experiments prove that the method reached the state of the art accuracy level than the above method on the three classical fine-grained datasets including CUB-200-2011,Stanford Cars and Aircraft respectively.3.Aiming at the problem that the most discriminative regions of fine-grained images to mine is difficult,the method based on Cross Bilinear CNN and Actor-Critic strategy has been proposed.The method uses the Actor-Critic strategy to mine the most attentive regions of the image.The Actor module is responsible for generating the most discriminative top M candidate regions.The state value of the action is evaluated using the cross bilinear vector in Critic module.Then,the method uses the criterion of sorting consistency to calculate the reward of the action in the current state,which gives the value advantage to feed back to the Actor module and updates the output of the most attention regions.The fine-grained categories of FGVC are predicted using these most discriminative regions and original image features.The method can better mine the most attentive regions of fine-grained images.The experiments prove that the method acquired the state of the art accuracy level than the prior methods on the three classical finegrained datasets including CUB-200-2011,Stanford Cars and Aircraft respectively.The above algorithms reached to the real time speed when reasoning.Finally,the shortcomings of the above methods and the direction of improvement are analyzed in the future.
Keywords/Search Tags:Fine-grained Visual Classification, Multi-stream, Cross Bilinear Vector, Multi-scale Feature Fusion, Bagging, Actor-Critic
PDF Full Text Request
Related items