Fine-grained image classification is mainly used to identify similar sub-category objects in the same category,such as distinguishing different species of birds,different types of vehicles and aircraft.The main challenges of this task are the large intra-class differences,small inter-class differences and the difficulty in obtaining areas with discriminatory information.At present,the classic fine-grained image classification methods improve the classification accuracy by locating more discriminative regions of the target in the image.However,this kind of method is more complex in the training process and uses more training parameters for human cross-validation,which leads to the limitations of the traditional fine-grained classification algorithm on the practicability of the model.In order to solve the above problems,this paper realizes the fine-grained image classification function by building an end-to-end single-stage network.The main research contents include the following points:(1)The depth network mainly focuses on the areas with large changes in information in the image,and the target often exists in such areas.In order to enable the network to obtain more effective features,this paper introduces the mask operator to identify the key features in the image.Considering the differences between channels in the network,this paper proposes a channel adaptive mask attention mechanism algorithm to improve the generalization ability of the model.In this paper,ResNet34 is used as a feature extractor,and the higher-order feature learning in HBPASM network is used as a classification network.Experiments are carried out on CUB-200-2011 bird data set.The experimental results confirm the effectiveness of this method.(2)Most of the discriminative information in fine-grained images is located in local regions,and the learning methods of regions of interest in neural networks will affect the classification effect to a certain extent.Therefore,this paper proposes to aggregate the adaptive masks of channels on different convolution layers to obtain the regions of interest of different channels and improve the discriminative learning ability of the model.At the same time,this paper can model the higher-order feature information of fine-grained images to form a good feature representation,improve the ability of the network to learn the region of interest of the channel,strengthen the ability to distinguish fine-grained features,and achieve end-to-end training and classification of fine-grained images.(3)The fusion of channel adaptive mask and region of interest can obtain more discriminative features.A large number of experiments have been carried out on three common data sets,CUB-200-2011,Stanford Cars and FGVC-Aircraft,which verify that this method can not only maintain simplicity and reasoning efficiency,but also improve classification accuracy. |