Fine-grained image recognition is a new research hotspot in recent years.It focuses on subcategories with higher levels of detail within a larger category,such as subcategories of dogs or birds.However,due to factors such as pose,color,and environment,different categories may be very similar,while there may be significant differences within subcategories.Therefore,fine-grained images have the characteristic of small differences between categories and large differences within subcategories.There are still several problems to be solved in fine-grained image recognition,including the impact of long-tail distribution on model recognition performance,the difficulty of extracting discriminative features from multiple regions,and the lack of research on jointly learning key and auxiliary discriminative regions.In response to these issues,this paper designs multiple weakly supervised fine-grained image recognition networks based on deep learning,with the following specific work:(1)A fine-grained image classification network based on variable-weight Focal Loss.The long-tail data type limits the practical application of fine-grained recognition models in real life,as the training of the network tends to be biased towards the head class samples and ignore the training of most tail class samples.To solve this problem,a variable-weight Focal Loss function is designed to reduce the negative impact of the long-tail data distribution on the model by weighting difficult samples and improving the bottleneck of Mobilenet V2,proposing a new feature extraction module CA-bottleneck.Finally,experiments are conducted,which show that this method can significantly improve the recognition accuracy of fine-grained image datasets with long-tail data distribution.(2)A fine-grained image classification network based on multi-region attention.Currently,the difficulty of fine-grained image classification lies in how to accurately locate highly distinguishable local regions and other auxiliary discriminative features in the image.Firstly,Inception V3 is used to extract image features,and the model is forced to focus on secondary features by repeatedly using attention erasure.Then,more accurate local images are obtained by background removal and upsampling,and the local and overall images are cascaded for more detailed learning.In addition,a joint loss function is designed to improve the recognition performance of the model by dynamically balancing difficult and easy samples and reducing intra-class differences.This method shows higher accuracy compared to other listed methods on public datasets.(3)A fine-grained image classification network based on reinforcement complementary learning.In order to integrate more detailed learning of key regions with learning of auxiliary discriminative regions,Inception-V3 is used as the feature extraction network,and the model is driven by a drive module to perform reinforcement learning and complementary learning.While using reinforcement learning to obtain more detailed fine-grained image features,the complementary network is designed to obtain supplementary discriminative regions for the target by using attention erasure,thus increasing the network’s perception of the overall target.Finally,experiments show that this method is at an excellent level in terms of performance and recognition accuracy,proving the effectiveness of this method. |