Font Size: a A A

Research On Fine-grained Image Classification Based On Deep Convolutional Neural Network And Dual-domain Attention Mechanism

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y HouFull Text:PDF
GTID:2428330611957084Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Image classification is one of the most basic research tasks in the field of computer vision.In recent years,with the changes in people's daily life needs and marketization factors,researchers have gradually shifted their research focus from large-scale coarse-grained image classification tasks such as cats,dogs,flowers,and birds to fine-grained image classification tasks for subcategories of these basic categories.However,due to the extremely high degree of similarity between fine-grained image samples,the deep convolutional neural networks that perform well on traditional image classification tasks are difficult to effectively perform on fine-grained image classification tasks.How to construct the fine-grained image classification networks with high classification performance,low computational complexity and strong migration ability has become the focus and difficulty of in this research field.The research goal of this thesis is to construct the weakly supervised fine-grained image classification network models with high classification performance,to complete the network end-to-end training and testing process with only image-level category labels.By combining the idea of multi-scale feature pyramid and introducing dual-domain attention mechanism,the existing classic algorithm models B-CNN(Bilinear CNN)and CBP(Compact Bilinear Pooling)are improved.The main research progress and achievements are as follows:(1)For the current mainstream fine-grained image classification networks,only the top features extracted by a single convolutional layer are used for classification,resulting in the serious defect of loss of recognition accuracy.This thesis starts with the working mechanism of the convolutional neural network,the multi-scale feature pyramid fusion network is constructed,which can fully extract and fuse the first-order and second-order features from the shallow and deep layers output of the network to represent the global and local discriminative regions information of fine-grained images.Then,bottleneck layer modules constructed by different numbers of 1x1 convolution kernels and batch normalization layer modules are embedded in the network structure,and a large number of ablation experiments are performed on three public fine-grained image benchmark datasets,CUB-200-2011,Stanford Cars and FGVC-aircraft.The experimental results show that the multi-scale feature pyramid fusion network can greatly improve the classification performance of the two baseline models B-CNN and CBP.At the same time,after further embedding the two modules of the bottleneck layer and the batch normalization layer in the network structure,it can achieve effective dimensionality reduction of multi-layer features,greatly reduce the parameter amount of the models,and accelerate the network's training convergence speed.(2)For the existing mainstream fine-grained image classification networks,only the feature maps extracted by each convolutional layer are mapped layer by layer in a cascading manner during training.In this process,attention to the more fine-grained eigenvalue distribution of the initial convolutional feature maps is ignored,which cause a bottleneck in the classification performance of the network models.In this thesis,the attention mechanism is introduced from the channel domain and the spatial domain of the convolutional feature maps,and the dualdomain mixing is considered to design four efficient and flexible attention modules.At the same time,in order to verify the performance of each attention module,it is embedded into each feature fusion network constructed previously,and experiments are performed on CUB-200-2011,Stanford Cars and FGVC-aircraft datasets.Experimental results show that embedding these lightweight universal attention modules in the network structure,the classification performance of each initial feature fusion network can be steadily improved.In particular,the dual-domain mixed attention networks based on the serial cascade method achieved classification accuracy of 86.2%,93.02%,and 91.0% on the corresponding test sets,which is better than the current Mask-CNN,RA-CNN,HIHCA and other well-known algorithms.It is fully verified that the dual-domain mixed attention feature fusion network is a fine-grained image classification algorithm with excellent performance,which based on weakly supervised information.
Keywords/Search Tags:Fine-grained Image Classification, Deep Convolutional Neural Network, Weak Supervision, Feature Pyramid, Attention Mechanism
PDF Full Text Request
Related items