Along with the rapid development of artificial intelligence technologies such as neural networks in recent years,image classification has seen groundbreaking research results in both academia and industry.Fine-grained image classification task is one of the hot research directions in the field of image classification.Differing from the general image classification,the purpose of fine-grained image classification is to identify the sub-categories of a large class with their subtle visual differences.For fine-grained images with large intra-class variances,it needs to find local commonalities with small differences,while for the images with small interclass variances of parts,it needs to find more general differences between classes.Therefore,this paper focuses on mining the local nuances in the images to facilitate the improvement of fine-grained image classification performance.Based on convolutional neural networks and attention mechanisms for fine-grained image classification,the main work of this paper is as follows:(1)For fine-grained image classification,a broad range of the prior research mainly focused on extracting the features of objects or parts in a unidirectional form,without considering the potential useful information between objects and parts as well as that within themselves of different feature levels.As a result,the fine-grained features that are discriminative for image classification cannot be adequately exploited.By contrast,this paper proposes a novel saliency-based interaction feedback network,which adaptively learns the finegrained features by modeling the saliency of object and its parts(OPs),the dependencies between OPs as well as the self-calibration of OPs features with two attention-based modules:first,in preliminary feature learning module,object and part streams are designed to obtain the salient features that are expressive for object-related and part-related regions respectively and second,a progressive representation learning module is specially built through the interaction of OPs and the feedback of themselves in different feature levels,with the goal of transforming the salient features to the fine-grained ones with discriminative ability as high-level features for final representations of OPs.A wide range of experiments are conducted on CUB-200-2011,Stanford Cars and FGVC-Aircraft datasets,and the experimental results demonstrate that the proposed method achieves state-of-the-art performance.The proposed method adequately sheds the inability of previous networks by explicitly modeling the saliency of OPs,and enables it to capture fine-grained features with discrimination ability for fine-grained image classification.(2)Existing methods of fine-grained image classification mainly focus on distilling information from high-level features in depths,while the importance of low-level features in shallow network is ignored.It results in a lack of diversity in the part localization with the same semantics,affecting the classification performance.To address the issue,this paper proposes a fine-grained image classification method based on multi-level feature dependence.Specifically,a multi-level feature extraction module based on bidirectional path is firstly designed,which includes vertical bidirectional path(VBP)and horizontal bidirectional path(HBP).The VBP aims to realize the flow of low-level features,contributing to improving the localization ability of diverse parts,and then,the HBP can suppress the significant regions in these parts and obtain more discriminative features by part refinement.Secondly,attention gating is introduced in long short-term memory(LSTM),and a feature dependence extraction module based on attention LSTM is proposed to adjust the focus on fine-grained information at each level,aiming to mine fine-grained features and improve their discrimination.The experimental results show that the proposed method achieves 90.8%,95.9%and 95.4%on three widely-used datasets,respectively,which is better than state-of-the-art methods.The proposed method can extract multi-level features and establish their interdependence,which ensures the discrimination and diversity of the features and boosts the performance of fine-grained image classification.(3)For the fine-grained image classification method design to be of practical use,this paper proposes a lightweight attention-based dictionary-learning network with two lightweight modules and efficient training and testing processes.Firstly,a compact attention dictionary learning module for feature representation is constructed to sparsely encode the process of attention-weighted features,which is conducive to obtaining a locally compact low-rank discriminative representation.Then,an attention-guided data augmentation module is proposed to crop and drop the discriminative parts by captured attention in the feedback path,aims at driving the network to learn more fine-grained and rich discriminative features.Following these works,the classification performance of the proposed method is improved through the efficient training and testing processes.Extensive experiments are first conducted on three standard public datasets and second,a real dataset of dish recognition is constructed to verify the practicability of the proposed method through a series of experiments.By means of compact feature representation and reasonable data augmentation,the proposed method can realize efficient and practical classification of fine-grained images in real daily life. |