Lightweight Design Of The Visual Transformer Network Structure

Posted on:2024-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:L Chen

Full Text:PDF

GTID:2568307118450804

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The development of Artificial Intelligence is promoting the intelligent and widespread use of computer vision.Transformer is becoming more and more mature in computer vision applications.However,due to some fundamental problems of the Transformer itself,such as the huge number of parameters and the large amount of computation required to achieve high performance,this makes the Transformer network more and more complex.To address these problems,this study compares several commonly used model lightweighting methods and finally chooses a model lightweighting method for the design of the structure of the network for experimentation.Since Swin Transformer can implement multiple computer vision tasks and perform well,and Mobile ViT is a Transformer version of the lightweight network Mobile Net,these two network models are chosen for the experiments.For the Swin Transformer model,the Local Aggregation module is introduced,and the image classification task is selected for experiments based on the same number of parameters.The results show that the accuracy of the improved method is improved.In addition,this study selects the more lightweight model GhostNet and Mobile ViT networks,and applies the plug-and-play Ghost Module in the GhostNet network to Mobile ViT to achieve a lightweight model for Mobile ViT.The experimental results show that the number of parameters is reduced.The main work of thesis is as follows.1.Swin Transformer uses the shifted window mechanism,which has good performance in computer vision tasks such as image classification,object detection and semantic segmentation.In thesis,Local Aggregation module is introduced and applied to the original model of Swin Transformer.The new model has a significant improvement in accuracy compared to the original Swin model on the image classification task with the same number of parameters,and it has a good performance on Cifar10,Cifar100,Caltech 101,Mini Image Net and other datasets.2.In this thesis,combing the existed network GhostNet and Mobile ViT,applying the plug-and-play module Ghost Module in GhostNet to Mobile ViT,and conducting experiments on three models of Mobile ViT.The experiments show that the improved method has reduced the number of parameters proving the feasibility of this joint approach.In summary,thesis further improves the existing network Swin Transformer for the problems of large number of parameters and large computational effort of Transformer network,and achieves the performance improvement.The joint GhostNet and Mobile ViT method is also used to achieve the model lightweighting of ViT.Experiments show that the joint GhostNet and Mobile ViT method can achieve the performance improvement of Transformer network.

Keywords/Search Tags:

Transformer, model lightweighting, network structure designing, image classification

PDF Full Text Request

Related items

1	Optimization Study Of Transformer Model Based On Image Classification
2	Research On Multi-label Image Classification Based On Multi-scale Feature Enhancement
3	Research On Texture Image Classification Algorithm Based On Deep Learning
4	Part Based Method For Fine Grained Image Classification
5	Research On Flame Detection Method Based On Deep Neural Network
6	Research On Fine-grained Image Classification Based On Multi-branch Attention And Fused Multi-level Features
7	Research On The Semantic Segmentation Method Of Image Based On Transformer
8	Research On Image Caption Algorithm Based On Transformer Architecture
9	Research On Data Efficient Vision Transformer Network
10	A Research To The Classification Of Periodicals In Art Designing--The Category Of Environmental Art Designing