Font Size: a A A

Research On Mix-Based Data Augmentation Method In Fine-Grained Classification

Posted on:2024-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:H C LiFull Text:PDF
GTID:2568307136990399Subject:Information networks
Abstract/Summary:PDF Full Text Request
In the field of computer vision,fine-grained image classification is a challenging task due to the minuscule inter-class variations and substantial intra-class differences present in fine-grained images.Achieving accurate classification necessitates high discriminative capabilities of models,especially when training data is scarce,making the training of precise fine-grained classification models particularly challenging.Data augmentation,through affine transformations,is a viable solution to this issue.However,traditional augmentation techniques,such as flipping,rotating,or cropping,often distort prominent features associated with certain classes,thereby limiting their effectiveness for finegrained classification tasks.In recent years,mixed image data augmentation methods have emerged as a focal point of research.These methods involve blending images from different classes and assigning compound labels to the resultant image,thereby generating new images that greatly vary from the original ones and improving dataset diversity.This thesis primarily investigates mixed image data augmentation methods for fine-grained classification tasks.The research progresses from the perspectives of offline mixed data augmentation,offline mixed data augmentation assisted by external knowledge,and online mixed data augmentation,offering the following innovations:Addressing the label noise issue caused by the mismatch between the generated mixed images and their corresponding labels in fine-grained tasks,the thesis proposes a Saliency-map Guided Data Augmentation(SGDA)method.During the generation of mixed images,the saliency map is guided by the Grad-CAM method to select the foreground from the source image,thereby avoiding the introduction of irrelevant regions.Moreover,the semantic information from the generated GradCAM is utilized to evaluate the labels of newly generated mixed images,ensuring a better match between the mixed images and their labels.This novel approach presents a fine-grained saliency region localization method that combines with Grad-CAM to enhance the precision of saliency region localization.By leveraging Grad-CAM without increasing complexity,the proposed method effectively reduces label noise.Its performance is thoroughly validated on four publicly available fine-grained recognition datasets.Considering that the randomness in the selection of mixed image samples might restrict the performance of mixed data augmentation,the thesis introduces a Commonsense-assisted Finegrained Image Data Augmentation(Co FIDA)method.For the first time in data augmentation research,the effectiveness of commonsense assistance is explored.By using commonsense knowledge to uncover potential correlations between sample labels,a multi-branch convolutional neural network structure is designed for structuring image mixing strategies,thus enabling targeted mixing of images and making classification networks pay more attention to subtle differences in targets.The effectiveness of Co FIDA is empirically evaluated on four public fine-grained recognition datasets.In response to the issue of imprecise saliency region localization in complex background finegrained images by offline mixed data augmentation methods,an Automatic Mixed Data Augmentation Method based on Cross-Region Attention(AMDA)is proposed.Unlike offline methods that crop entire regions,AMDA initially proposes a regional cross-attention mechanism.Through the collaboration of window and shift window cross-attention,this mechanism can explore the correlation of image blocks within and between windows.Furthermore,a mixed image generation subnetwork is designed based on the regional cross-attention mechanism,and the designed subnetwork is embedded into the fine-grained classification network.During network embedding,the dual optimization problem caused by the coexistence of the mixed image generation subnetwork and the fine-grained classification network is primarily resolved.Extensive experimental evaluations on four large-scale and two small-scale fine-grained datasets demonstrate the superiority of the proposed method in fine-grained image data augmentation.
Keywords/Search Tags:Data augmentation, Fine-grained image classification, Saliency map, Cross-attention mechanism, Commonsense map
PDF Full Text Request
Related items