| Fine-grained image recognition is one of the most important branches in the field of computer vision.In recent years,with the rapid development in the field of computer vision,fine-grained image recognition technology has shown strong value in the fields of catering,online shopping,education,urban traffic and so on.Fine-grained image recognition techniques can be divided into strongly supervised fine-grained image recognition and weakly supervised fine-grained image recognition.Strongly supervised fine-grained image recognition requires appropriate manual annotation of the region to be recognized in the image before training,and its preparation is time-consuming and laborintensive,requiring a high degree of expertise in manual annotation.Therefore,this thesis investigates fine-grained image recognition without relying on additional manual annotation information.In order to obtain higher accuracy of fine-grained image recognition technique,it is necessary to perform finer position detection and finer feature extraction for this target object.In this thesis,a deep learning convolutional neural network algorithm framework is used to strengthen the feature information extraction capability of the network using a network with discriminative regions with localization capability,a network with feature enhancement,a network with discriminative regions with masking capability,and a network module with bilinear attention.The main research contents are as follows.(1)The research background and significance of the fine-grained image recognition problem are reviewed,and the domestic and international research methods for fine-grained image recognition are reviewed,leading to the conclusion that weakly supervised fine-grained image recognition methods are the mainstream direction of current research.The techniques related to fine-grained image recognition in the process of deep learning,two common attention mechanisms in computer vision and common datasets in the field of fine-grained image recognition are introduced.(2)In order to solve the problem of low recognition accuracy of convolutional neural network in training data,this thesis adopts the idea of migration learning and uses Efficient Net V2 as the backbone network to perform image recognition for each of the three fine-grained image datasets.In the process of migration learning training,the weight parameters of the network except for the fully connected layer are frozen for the first ten training sessions,and all the weight parameters are unfrozen for the subsequent training sessions.The comparison experiments with other network algorithms and the visual analysis experiments on our own dataset conclude that the migration learning based on Efficient Net V2 network has a strong image recognition performance.(3)In order to solve the problem that the weakly supervised fine-grained image recognition algorithm can hardly capture the most discriminative feature regions in the image,an Attention Feature Extraction Network(AFEN)is proposed to strengthen the network for feature extraction of the optimal discriminative feature regions.Feature enhancement is used to strengthen the weight of the network on the relevant feature map channels,flexible pooling is used to reduce the information loss of the network,and localization discriminative region network and masking discriminative region network are designed to correct the parameter information inside the network by interacting with the backbone network.Class center loss,complementarity loss function Ls1,complementarity loss Ls2 with classification loss are designed and applied to correct the correction effect of the network.Experiments are applied to CUB-200-2011,FGVC-Aircraft and Stanford Cars datasets,and according to the experimental results,it can be concluded that the fine-grained image recognition accuracy of AFEN algorithm is higher than other network structures,and it shows better fine-grained image recognition ability for the localization of discriminative regions and the enhanced extraction of feature information. |