Font Size: a A A

Fine-grained Image Classification Based On Adaptive Interactive Selection And Attention

Posted on:2023-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:2568306623980839Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Fine-grained image recognition is used to distinguish different subclasses under the same category.Due to the difficulties caused by the subtle visual differences among categories,fine-grained classification faces greater challenges compared with general classification problems.How to learn more discriminative local features from images is critical to solving this problem.Among the existing fine-grained classification approaches,the bilinear model and destruction construction model are considered as two of the most effective methods to address subtle inter-class differences.The bilinear model can actively learn discriminative features through feature interaction without relying on additional annotation information;while the destruction construction model generates a destructed image by disrupting the global semantic information within the image,forcing the network to learn the discriminative features in the local area.However,both of these two models have serious problems:1)the feature set used for interaction in the bilinear model is pre-defined,and also the complementary information within multi-scale features is not utilized;2)the destruction construction model ignores the multi-granularity local information,and also fails to eliminate the edge noise of each local region in the destructed image.To address the problems in these two types of models,this paper has carried out the following researches:Aiming at problem 1,this paper proposes a Multi-scale Selective Hierarchical bi Quadratic Pooling(MSHQP)model.This model can improve the feature interaction manner within convolutional layers of multiple semantic levels,and adaptively learn the optimal interaction subset for a specific dataset.Firstly,this paper proposes a biquadratic pooling module,which models the inter-layer relationship within different convolutional layers and intra-layer corelation between different channels in one layer via hadamard operation and bilinear pooling to perform,respectively.Secondly,this paper extracts convolutional features of different scales from multiple semantic levels,and uses the complementary information among different semantic levels to further enhance the performance of biquadratic pooling.Finally,this paper proposes a sparse interaction selection module with supervisor information to adaptively learn the optimal interaction subset from all candidate sets for a specific dataset.Aiming at problem 2,this paper proposes a Attention-based Multi-granularity Region Confusion(AMRC)model.The model aims to extract the discriminative information in local region of an image in order to improve the object recognition accuracy.Firstly,this paper proposes a multi-granularity region confusion mechanism,which divides the original image into multiple sub-regions and shuffling them randomly,so that the neural network can learn the multi-granularity discriminative features of an image.Secondly,this paper designs a multi-scale spatial attention fusion module to eliminate the region edge noises introduced by the region confusion mechanism,and helps the network to locate the multi-granularity discriminative regions in a more accurate manner.In summary,this paper proposes two general fine-grained classification models,which are devised from the perspectives of optimizing feature interaction and focusing on local regions of images,respectively.The experimental results conducted on multiple public datasets demonstrate the excellent performance of the models proposed in this paper.
Keywords/Search Tags:Fine-grained Image Classification, Biquadratic Pooling, Multi-scale Features, Adaptive Selection, Regional Confusion, Spatial Attention
PDF Full Text Request
Related items