A Study On Fine-Grained Image Recognition Based On Transformer

Posted on:2024-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:T Su

Full Text:PDF

GTID:2568307085494584

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Fine-grained image recognition(FGIR)has always been a challenging task in computer vision.Different from traditional image classification,fine-grained image recognition requires to distinguish subclasses under a base class,and its data has large intra-class variances and subtle inter-class variances,which brings considerable difficulty to the development of related algorithms.One of the key points for distinguishing fine-grained categories is the highly discriminative local area in the image,such as the head,torso,wings,etc.of birds.This is also in line with human cognition.Therefore,the feature mining and representation of these regions is an important research direction for fine-grained image recognition.In recent years,Transformer has made great achievements in the field of traditional image classification with its powerful self-attention mechanism,surpassing the classic method based on convolutional neural network(CNN),but it is still less applied in the field of fine-grained image recognition.And Transformer’s self-attention mechanism pays attention to the patches of the image,which is not fine-grained enough for mining of local characteristics in FGIR.At the same time,the existing methods basically ignore the redundant characteristics in the fine-grained images,resulting in the unnecessary information in feature representations studied by the network.This thesis conducts in-depth research on discriminative feature mining and representation in fine-grained images,and proposes two efficient Transformer-based fine-grained image recognition algorithms.The main tasks include:(1)CNN-Transformer hybrid model.For the mining of discriminative local regions in fine-grained images,this algorithm proposes a CNN-Transformer hybrid structure network,which uses a pre-trained CNN to extract information about the contour/position of the foreground in the image and generate an attention map and embed the attention map to original image,to constrain and guide the learning of the Transformer backbone network,reduce the interference of background noise and enable feature mining on a finer scale.(2)A fine-grained classification model that introduces information bottlenecks.For the feature representation of discriminative local regions in fine-grained images,this algorithm uses information bottleneck(IB)to constrain the training of Transformer network,and filters redundant feature information that is often ignored in previous methods.A regularization cross-entropy loss improves the performance of the backbone network without adding additional parameters.To sum up,based on the Transformer network,this thesis designs algorithms from two different perspectives for the discriminative regions of fine-grained images,and conducts a large number of experiments to verify the effectiveness of the model.

Keywords/Search Tags:

Fine-grained Image Recognition, Transformer, Information bottleneck

PDF Full Text Request

Related items

1	Fine-grained Image Recognition Based On Deformable Transformer And Multi-Scale Attention
2	Transformer-Based Fine-Grained Image Classification Method
3	Research On Fine Grained Image Recognition Method Based On Visual Transformer And Data Optimization
4	Analysis And Research Of Key Technologies For Fine-grained Image Recognition Based On Convolutional Neural Networks
5	Research On Fine-Grained Image Analysis Based On Machine Learning
6	Research On Fine-grained Image Recognition For Weakly Supervised Scenes
7	Research On Fine-grained Image Recognition Algorithm Based On Transformer
8	Research And Application On Fine-Grained Image Classification Based On Bilinear Model
9	Research On Fine-Grained Car Recognition Based On Deep Semantic Features Enhancement
10	Fine-Grained Recognition Of Yunan Wild Bird Images Based On Deep Learning