Font Size: a A A

Research On Calligraphy Style Classification Algorithm Based On Attention Mechanism

Posted on:2023-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:W H YuFull Text:PDF
GTID:2555307040499574Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Calligraphy is the traditional culture of the Chinese nation and is deeply loved by the people.With the development of information technology,a large number of calligraphy images are stored on the Internet.How to classify these calligraphy images effectively is an important problem.In recent years,with the continuous development of machine learning,convolutional neural network(CNN)can effectively solve this problem.In the field of calligraphy images,the differences in the characteristics of the five basic calligraphic styles are obvious,and they are usually easy to identify.While the differences in characteristics between calligraphic styles are small,it is not easy for the untrained person to see the difference,so it is often difficult to distinguish them.Therefore,calligraphy style classification is a more challenging task.However,most of the existing researches focus on the classification of the five basic calligraphic fonts,and there are few researches on the classification of calligraphic styles.In addition,the existing calligraphy style classification m,odels are relatively backward in structure design and cannot learn calligraphy style features well.Therefore,the accuracy has a llarge room for improvement.This paper mainly carries out research work from the following aspects.(1)Firstly,in this paper,the structure of CNN is firstly studied.By analyzing the existing calligraphy style classification model,its structure is optimized and a calligraphy style classification algorithm based on CNN is explored.According to the idea of InceptionNet and VGG algorithm,it uses multiple small convolutional kernels to replace a large one,so as to increase the depth of convolutional neural network under the condition of obtaining the same receptive field.Based on the existing calligraphy style classification model,the 5×5 convolution was decomposed into two 3×3 convolution,which formed a network stage with pooling,batch normalization and nonlinear activation function.Finally,a CNN model with five stages was constructed and used as the baseline model.Experimental results show that the accuracy of the optimized CNN structure is better than that of the original structure,and the training process is more stable.In addition,through experiments,we find that the accuracy of plain network structure is significantly higher than that of the widely used residual network structure,which indicates that plain network structure is more suitable for calligraphy style classification task.(2)Secondly,considering that the model’s feature learning ability has a direct impact on the classification performance,in this paper we explore a CNN calligraphy style classification algorithm based on the enhancement of attention module.The characteristics of calligraphy style are very subtle but very important.For experienced people,through a local characteristics of calligraphy style can identify his results.Therefore,in this paper we adopt CBAM,a channelspatial attention module,to emphasize the information of calligraphy style features in spatial and channel dimension respectively.The module has a simple structure and can be directly embedded into the existing CNN structure without changing the original structure.In proposed method,two CB AM modules are embedded into the fourth and fifth stages of the baseline respectively to form a combined model.Experimental results show that the attention module can improve the accuracy of the baseline without changing other parameters.In addition,we use Grad-CAM to do a visualization experiment,and it can be seen from the experimental results that the classification network can effectively identify different kinds of calligraphy styles.(3)Finally,in this paper we study a new structure called Vision Transformer(ViT)for image classification based on self-attention,and explore a ViT for calligraphy style classification algorithm based on local feature enhancement.The process of this algorithm is similar to that of ViT.That is,the calligraphy image is divided into many image patches of the same size to form a one-dimensional sequence.Secondly,the image patches vector in this sequence are linearly projected into the specified dimension to form image patch embeddings.Each image patch is then embedded with position encoding and a class token is added to the beginning of the sequence.This sequence is then fed into a structure stacked with standard Transformer encoders for feature transform,and finally,the output sequence of the last encoder is passed through the multilayer perceptrons(MLP)for classification results.However,considering that multiple self-attention is global in ViT,it lacks the ability to extract local information.Therefore,in this paper we propose to use CNN instead of MLP in the feedforward network layer to emphasize local features in Transformer structures.Experimental results on calligraphic style data set show that the proposed model can classify calligraphic style images well,and the classification results are better than several existing Transformer structures,which provides a new method for calligraphy style classification.
Keywords/Search Tags:Chinese Calligraphy, Style Classification, Convolutional Neural Network, Attention Mechanism, Vision Transformer
PDF Full Text Request
Related items