Font Size: a A A

Research On Chinese Recognition Algorithm Based On Maximum Entropy Regularization

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:W H XuFull Text:PDF
GTID:2518306572485814Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Text is one of the greatest inventions of mankind.It is not only a written expression of human language,but also a cultural heritage.On the one hand,text,as an important information medium,contains a large amount of text in books,documents,bills,etc.,which can facilitate people's information exchange,thereby significantly improving office efficiency.On the other hand,the text in images and videos carries a lot of semantic information.By recognizing the text in them,it can greatly help our understanding of special scenes.It can be said that in daily life,texts can be seen everywhere,especially in the current information age,the importance of texts is becoming more and more significant.In Chinese recognition,due to the high similarity between the character classes,the large variance within the class,and there is usually a large data imbalance between the characters,these features lead to a large demand for training data.Therefore,the Chinese-based recognition model is prone to overfitting.Therefore,the purpose of solving the over-fitting problem in Chinese recognition is that the model can better distinguish between similar characters themselves,and can better distinguish between common and rare characters,and improve the recognition rate of the overall model.In this thesis,the research on the recognition of Chinese characters and Chinese text lines is carried out with the help of deep learning technology combined with the characteristics of Chinese.The main contributions are as follows:(1)A Chinese character recognition method based on the maximum entropy regular term is proposed.This method is to add the maximum entropy regular term as the new loss function of the model after the cross-entropy loss function.First,an in-depth theoretical analysis of the maximum entropy regularization term is carried out,including the influence of the maximum entropy regularization term on the convergence probability distribution and the learning process.The model is regularized by maximum entropy,and the generalization and robustness of the model can be improved without adding new parameters to the model.Three methods for determining hyper-parameters are proposed.By flexibly selecting hyper-parameters to achieve better recognition accuracy in Chinese character recognition,it can alleviate the problems caused by imbalanced Chinese character data,large intra-class differences,and similar inter-class distributions.Fitting problem.The model's recognition accuracy rate on the natural scene data set is increased by 0.99%,and the recognition accuracy rate on the handwritten Chinese character data set is increased by 0.37%.(2)A Chinese text line recognition method based on the maximum entropy regular term is proposed.This method is to add the maximum entropy regular term as the new loss function of the model after the Connectionist Temporal Classification loss function.By analyzing the peak distribution problem of using the time series classification loss function,three methods to determine the hyperparameters are proposed.By flexibly selecting hyperparameters,we can achieve better recognition accuracy on the Chinese text line recognition problem,and reduce the model overfitting problem caused by the peak distribution of the Chinese text line recognition using the Connectionist Temporal Classification loss function.The recognition accuracy rate of the model on the Chinese scene text data set is increased by 0.92%,and the recognition accuracy rate on the network image text data set is increased by 1.55%.
Keywords/Search Tags:Chinese character recognition, Chinese text line recognition, cross entropy loss function, Connectionist Temporal Classification loss function, deep neural network
PDF Full Text Request
Related items