| Compared with the traditional image classification task,there is only little difference between some categories,which makes the fine-grained task more challenging.In the insect scene,due to the characteristics of small target individuals and large background interference,the fine-grained classification has higher complexity.Traditional methods for fine-grained tasks have some limitations during training,the features they generate are only associated with a single specific input,and traditional attention modules lack global information.According to the characteristics of the fine-grained classification task in the insect scene and the limitations of traditional methods,the main work of this thesis is as follows:1.A new data enhancement scheme is proposed for the characteristics of small insect targets and background interference in the insect scenes.In this data enhancement scheme,the basic network is firstly used to train on the insect fine-grained dataset to extract the salient regions from the input.Afterwards,this thesis processes the background area to reduce its influence,or enlarge the content of the salient area to alleviate the problem of subtle insect features.Finally,the proposed enhancement scheme is used to fuse with the original features to strengthen the feature semantics.2.Aiming at the imbalance problem of the number of samples among different categories in the dataset,this thesis proposes a batch sampling strategy to alleviate it.According to the strategy,during the training process,n categories are randomly selected firstly,and then images are randomly selected from the train set with a specific strategy.All samples drawn in this way form a batch for one training.3.Aiming at the shortcoming that the attention mechanism in traditional methods is limited to a single sample,an attention structure that utilizes inter-data information is proposed.The structure first uses a batch of samples to generate preliminary channel attention weights,and then fuses the global channel weights associated with the sample information of the full set of categories to generate the final channel attention vector,which weights the sample features.4.A variety of loss functions are fused to train the model,so that the network can extract image features with better semantics during the learning process.All the structures proposed above have been verified by relevant experiments.After experimental verification,on the insect data set built in this thesis,the optimal effect can be obtained based on the ResNet-101 model. |