| Fine-grained image recognition is a fundamental task for artificial intelligence,which aims at distinguishing between different subclasses of the same category.It finds wide applications in intelligent retail,e-commerce,intelligent transportation,and other fields.However,this task poses significant challenges due to difficulties in learning robust local features for categories with subtle visual differences.Recently,destruction learning methods,which implicitly learns robust local features by destroying global image information and forcing the network to focus on discriminative parts,have gained increasing attentions.Nevertheless,there are at least two shortcomings in existing destruction learning approaches.Firstly,random image destruction is used to destroy global information,resulting in undesirable destruction results,and introducing undesirable noises into recognition tasks.Secondly,Although the global information of the image is disrupted,the network still tends to excessively focus on the background regions,thereby neglecting the foreground regions with the truly discriminative information.This issue becomes even more severe when dealing with images that have complex or abundant background information.This study investigates and improves upon these two shortcomings to address the issues encountered with existing destruction learning methods.To overcome shortcoming 1,we propose a novel adaptive shuffling-based destruction(ASD)learning model for fine-grained image recognition in this paper.Our method can effectively learn the optimal image shuffling strategy while simultaneously learning the optimal deep vision network and this optimal data enhancement is used to strengthen the classifier.To overcome shortcoming 2,this paper proposes a foreground-aware boosted(FAB)recognition module on the aforementioned ASD method.The FAB module incorporates a spatial attention mechanism that enables the network to identify and focus on the foreground region of an image.Adaptive destruction learning is then applied specifically to the foreground area to locate the discriminative areas more accurately.The FAB model effectively recognizes the subtle differences between subclasses by paying more attention to the image’s foreground,providing more robust and accurate results.The proposed model in this paper was evaluated on four publicly available fine-grained datasets.Experimental results demonstrate the promising performance of the proposed model.Compared to the currently leading performance destruction learning methods in fine-grained image recognition,the proposed model improved accuracy by 0.3%,0.2%,0.5%,and 0.4% on the CUB-200-2011,Stanford-Car,Stanford-Dog,and FGVC-Aircraft datasets,respectively.Moreover,we evaluate the learned immediate layers,showing that our method is also effective to learn powerful mid-layers. |