A Study Of Classification Algorithms For Ancient Chinese Characters Under Imbalanced Data Distribution

Posted on:2023-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Fang

Full Text:PDF

GTID:2555307052496374

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Ancient Chinese Bronze Inscriptions have a long history and high historical and artistic value.However,these characters are too old from this era and are very different from modern Chinese characters,resulting in a high reading threshold,general identification requires manual work by knowledgeable and experienced scholars.In addition,there is a serious imbalance of sample distribution in the ancient Chinese Bronze dataset.Some categories have thousands of samples,while some categories have only dozens of samples.The former is called the majority class or the frequent class,and the latter is called the minority class or Rare class.Models trained directly using the original dataset tend to perform poorly on the minority class.In view of the above problems,this paper applies deep learning technology to the intelligent recognition scene of Ancient Chinese Bronze Inscriptions,focusing on improving the recognition accuracy of the model for categories with a small number of samples.The specific work is as follows:(1)From the perspective of data augmentation,an Enhanced Rare-class Sample Generator(ERSG)is proposed,which dynamically generates minority samples by embedding a sample generation module in the neural network training stage.The datasets in this paper are all derived from bronze ware,and ancient scripts from the same source have similar handwriting and background characteristics.The sample generation module calculates the feature displacement of the majority class sample to its center by estimating the center of each class distribution,and then uses the feature displacement to act on the feature map of the minority class sample to dynamically generate the minority class sample,so that the overall data distribution tends to be balanced.ERSG improves the activation and loss functions of the original paper and uses a loss based on the effective number of samples to measure the classification error.Experiments show that the enhanced minority class sample generation algorithm has a certain improvement effect.(2)From the perspective of model parameters,the idea of residual learning is used for reference,and an algorithm based on residual fusion is proposed.The dataset is divided into head class,middle class and tail class in descending order according to the number of samples in the class.The model will use different branch parameters to optimize the combination of the above classes.These branches are gradually merged during the training process,and finally merged into the master branch.The shared parameters and the independent parameters of each branch are simultaneously optimized by back-propagation.Experiments show that the residual fusion algorithm has a certain improvement in the recognition effect of tail classes compared with the traditional recognition model.(3)Based on the idea of knowledge distillation,a training method of Class-balanced Knowledge Distillation is proposed to compressing the model while sacrificing less classification performance.The training process is divided into two stages.In the first stage,the distribution of the original samples will be maintained,and instance sampling will be performed.The categories with more samples will be more easily selected,so as to train multiple ”teacher” models and use different data augmentation strategies for these ”teacher” models.Each ”teacher” model can learn different features to supervise the ”student” model.In the second stage,class-balanced sampling is performed on the data set,and a subset with an equal number of samples of each type is finally obtained.The ”student” model optimizes network parameters by using this subset and under the supervision of multiple ”teacher” models generated in the first stage.Experiments show that class-balanced knowledge distillation training can improve the recognition effect of the model for tail classes.In this paper,several sets of experiments are conducted to demonstrate the effectiveness of the above algorithms.Although there are differences in the effectiveness of the algorithms but the problem solving perspectives are different and the applicable scenarios are also different,and the external conditions need to be integrated to choose the appropriate algorithm.This paper focuses on the classification task of ancient characters,but the used model and algorithm are also applicable to other classification tasks with unbalanced sample distribution.

Keywords/Search Tags:

Imbalanced Dataset, CNN, Data Augment, Residual Fusion, Known-ledge Distillation, Two-pahse Train

PDF Full Text Request

Related items

1	Construction And System Implementation Of Three Traditional Dragon Pattern Dataset
2	From "physical 'residual'" To "artistic "residual"
3	Music Emotion Representation Learning And Application Based On Multi-source Data Fusion
4	Aliention And Distillation
5	The Distillation Through The Struggle Between The Soul And Physicality
6	Research On Visual Analysis Of Large-scale Genealogy Dataset Based On Attribute-Structure Synchronization
7	Research On Film Personalization Recommendation Based On Resnet&FM Model
8	Research And Application On Emotion Recognition Of Employee Based On Multi-source Physiological Signal With Data Fusion
9	The Reincarnation And Distillation Of Mosaic Art In The Artistic Development
10	Evaluation Of Personality Traits Based On The Fusion Of Temporal And Spatial Features Of Handwriting