Few-shot fine-grained recognition aims to learn to distinguish several highly similar objects from different subcategories through limited labeled data.However,the current solutions mainly focus on capturing global semantic features,while ignoring local detailed features.However,for fine-grained classification tasks,these detailed features are essential for fine-grained recognition.In order to further improve the performance of few-shot learning method and better capture the local details,this paper proposes a variety of improved strategies.The main work of this paper is as follows:(1)In order to enable the network to better capture local details,this paper proposes an effective attention-guided pyramid feature,which can capture both global semantic features and low-level details features for fine-grained image recognition in few-shot learning.Specifically,the features of different scales extracted by the backbone network are combined through a multi-scale feature pyramid and attention structure.In addition,an attention-guided refinement strategy based on multi-level attention pyramid is proposed.This strategy uses the attention mechanism to crop the original image to highlight important foreground features and eliminate background noise.Through the above method,the accuracy of few-shot fine-grained image recognition can be improved.(2)In order to better capture the relationship between multi-scale features,this paper proposes an efficient cross-level relation-aware attention mechanism(CRA).The attention mechanism plays an important role in fine-grained visual analysis.However,most of the existing attention methods only work on one feature map and ignore the relation between the feature maps,which may lead to the loss of some key details.The cross-level relationaware attention mechanism proposed in this paper aims to mine the relation between multiscale features to capture key information.More specifically,the mechanism mainly includes two modules,Cross-Level Relation-Aware Global Feature(CRGF)and CrossLevel Squeeze-and-Excitation Attention(CSEA).The former will calculate the paired relationship at each position between the last two layers of the backbone network,while the latter will use the global relationship information captured by the high-level and the attention mechanism to capture the detailed information on the low-level.The two cooperate with each other to effectively improve the accuracy of fine-grained recognition. |