| The development of Artificial Intelligence is promoting the intelligent and widespread use of computer vision.Transformer is becoming more and more mature in computer vision applications.However,due to some fundamental problems of the Transformer itself,such as the huge number of parameters and the large amount of computation required to achieve high performance,this makes the Transformer network more and more complex.To address these problems,this study compares several commonly used model lightweighting methods and finally chooses a model lightweighting method for the design of the structure of the network for experimentation.Since Swin Transformer can implement multiple computer vision tasks and perform well,and Mobile ViT is a Transformer version of the lightweight network Mobile Net,these two network models are chosen for the experiments.For the Swin Transformer model,the Local Aggregation module is introduced,and the image classification task is selected for experiments based on the same number of parameters.The results show that the accuracy of the improved method is improved.In addition,this study selects the more lightweight model GhostNet and Mobile ViT networks,and applies the plug-and-play Ghost Module in the GhostNet network to Mobile ViT to achieve a lightweight model for Mobile ViT.The experimental results show that the number of parameters is reduced.The main work of thesis is as follows.1.Swin Transformer uses the shifted window mechanism,which has good performance in computer vision tasks such as image classification,object detection and semantic segmentation.In thesis,Local Aggregation module is introduced and applied to the original model of Swin Transformer.The new model has a significant improvement in accuracy compared to the original Swin model on the image classification task with the same number of parameters,and it has a good performance on Cifar10,Cifar100,Caltech 101,Mini Image Net and other datasets.2.In this thesis,combing the existed network GhostNet and Mobile ViT,applying the plug-and-play module Ghost Module in GhostNet to Mobile ViT,and conducting experiments on three models of Mobile ViT.The experiments show that the improved method has reduced the number of parameters proving the feasibility of this joint approach.In summary,thesis further improves the existing network Swin Transformer for the problems of large number of parameters and large computational effort of Transformer network,and achieves the performance improvement.The joint GhostNet and Mobile ViT method is also used to achieve the model lightweighting of ViT.Experiments show that the joint GhostNet and Mobile ViT method can achieve the performance improvement of Transformer network. |