Fine-grained image classification is one of the important research directions in computer vision.Compared with general image classification problems,the objects of fine-grained image classification are usually concentrated in the same domain,and the data features are similar between different categories;at the same time,fine-grained image data sets present obvious long-tail distribution features,which cause difficulties for classification.However,the current convolutional neural network-based finegrained image classification methods are limited in their ability to discriminate finegrained features due to the large perceptual field.At the same time,convolutional neural networks are not good at global feature extraction,and the image classification performance of related algorithms tends to be limited.In recent years,Transformer in the field of natural language processing has emerged in the field of computer vision,and its powerful information summarization and feature extraction capabilities have broken the performance bottleneck of many previous image classification tasks and provided new possibilities for the optimization of fine-grained image classification tasks.The thesis introduces Transformer into the field of fine-grained image classification,and solves two key problems based on Transformer:fine-grained feature extraction and long-tail distribution optimization.The thesis is based on the research cooperation project of enterprises and institutions "Research and development of edge intelligence technology and system equipment for smart factory",which solves the technical problems of artificial intelligence for smart factory implementation from the perspective of practical application.The main work of the thesis is as follows1)Reviewed the related research on fine-grained image classification.First,computer vision techniques are introduced,and then typical application scenarios and publicly available fine-grained datasets for fine-grained image classification tasks are outlined.Then,the research on fine-grained image classification methods is summarized and analyzed from two aspects:fine-grained feature extraction methods and long-tail distribution optimization methods.Finally,the current status and challenges of Transformer research for fine-grained image classification are given.2)The Transformer image classification method oriented to fine-grained feature extraction is proposed.A Transformer method based on adaptive attention is proposed for the problems of fine-grained image classification task data signal-to-noise ratio,large intra-class variance,and subtle inter-class differences leading to the difficulty of fine grained feature extraction.First,the main attention region of the Transformer is captured by key feature extraction.Then,the Transformer attention adaptive method is designed to adjust the attention of the Transformer by the attention weakening module,the attention enhancement module and the adaptive loss function.The performance as well as the effectiveness of the proposed method is verified by experiments.The experimental results show that the proposed method can improve the fine-grained feature extraction ability of the backbone network,effectively reduce the confusion of similar categories,enhance the recognition of key features,and improve the accuracy of fine-grained image classification.3)The Transformer image classification method for long-tail distribution is proposed.For the long-tail distribution problem of fine-grained image classification task,a Transformer method based on multi-scale feature optimization is proposed to protect the underlying and deep features and optimize the long-tail distribution,respectively.First,a hybrid data sampling method is designed to obtain the ternary data for optimizing the representation learning,long-tail distribution and fine-grained features.Then,the Transformer multi-scale feature optimization method is designed to optimize the feature learning process by the bottom feature comparison learning method and the deep feature balance learning method,respectively,to improve the category confusion and fine-grained feature extraction,and to increase the attention to the tail category while protecting the head category feature learning.The feasibility and effectiveness of the proposed method are verified by experiments.Simulation results show that the proposed method can effectively improve the impact caused by long-tail distribution in fine-grained image classification tasks,optimize the feature distribution,and improve the classification accuracy. |