Font Size: a A A

A Study On Fine-Grained Image Recognition Based On Transformer

Posted on:2024-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:T SuFull Text:PDF
GTID:2568307085494584Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Fine-grained image recognition(FGIR)has always been a challenging task in computer vision.Different from traditional image classification,fine-grained image recognition requires to distinguish subclasses under a base class,and its data has large intra-class variances and subtle inter-class variances,which brings considerable difficulty to the development of related algorithms.One of the key points for distinguishing fine-grained categories is the highly discriminative local area in the image,such as the head,torso,wings,etc.of birds.This is also in line with human cognition.Therefore,the feature mining and representation of these regions is an important research direction for fine-grained image recognition.In recent years,Transformer has made great achievements in the field of traditional image classification with its powerful self-attention mechanism,surpassing the classic method based on convolutional neural network(CNN),but it is still less applied in the field of fine-grained image recognition.And Transformer’s self-attention mechanism pays attention to the patches of the image,which is not fine-grained enough for mining of local characteristics in FGIR.At the same time,the existing methods basically ignore the redundant characteristics in the fine-grained images,resulting in the unnecessary information in feature representations studied by the network.This thesis conducts in-depth research on discriminative feature mining and representation in fine-grained images,and proposes two efficient Transformer-based fine-grained image recognition algorithms.The main tasks include:(1)CNN-Transformer hybrid model.For the mining of discriminative local regions in fine-grained images,this algorithm proposes a CNN-Transformer hybrid structure network,which uses a pre-trained CNN to extract information about the contour/position of the foreground in the image and generate an attention map and embed the attention map to original image,to constrain and guide the learning of the Transformer backbone network,reduce the interference of background noise and enable feature mining on a finer scale.(2)A fine-grained classification model that introduces information bottlenecks.For the feature representation of discriminative local regions in fine-grained images,this algorithm uses information bottleneck(IB)to constrain the training of Transformer network,and filters redundant feature information that is often ignored in previous methods.A regularization cross-entropy loss improves the performance of the backbone network without adding additional parameters.To sum up,based on the Transformer network,this thesis designs algorithms from two different perspectives for the discriminative regions of fine-grained images,and conducts a large number of experiments to verify the effectiveness of the model.
Keywords/Search Tags:Fine-grained Image Recognition, Transformer, Information bottleneck
PDF Full Text Request
Related items