| Fine-grained image classification is an important research field in computer vision.Unlike ordinary image classification tasks,images used for fine-grained image classification tasks have small inter class spacing.Sometimes,due to factors such as occlusion,light,posture or other factors,the intra class differences of the same subcategory are large.The difficulty of fine grained image classification tasks lies in the need to obtain discernible feature information while locating target objects.The strongly supervised fine grained image classification algorithm uses additional annotation information to locate the target area,which has a high classification accuracy and a high cost,so its practicality is limited.The emergence of fine grained image classification algorithms based on weak supervision solves the problem of high cost of obtaining information to a certain extent.In order to effectively focus on local area information,this paper proposes an enhanced self attention distillation network(ES-ADNet)to obtain features closer to the target.The main work is as follows:(1)The training of convolutional neural networks requires a large number of positive samples.This paper proposes two data enhancement methods,Cut Mix+AC(Attention Crop).The first is the Cut Mix data enhancement method,which directly performs a cropping and blending operation on the input image to form a new enhanced image;The second is an attention-guided clipping and zooming operation,which forms visually closer viewing images(images of different scales)that are fused with the original input image to train the network.(2)To solve the problem of insufficient feature extraction,this paper proposes a method of using self attention distillation to learn knowledge between layers of different depths.By dividing the Res Net network layer into blocks,four sub classifiers are formed to classify features of different depths,and the deepest network classifier after multiple attention levels is used to guide the learning of the sub classifiers.By distilling attention,the network can pay more attention to local discernible regions,reducing the impact of unrelated backgrounds on classification tasks.(3)Aiming at the problem that non target classes can affect target class prediction,this paper proposes a method to understand the coupled self distillation loss to decouple the self attention distillation loss between target and non target classes.This article combines self attention distillation with decoupling self distillation losses,and calls it self attention distillation network(S-ADNet).Experimental results on CUB-200-2011 and Stanford Dogs data sets show that the proposed method has significant improvement in classification accuracy indicators. |