Font Size: a A A

Research And Application Of Handwritten Text Recognition Method Based On Deep Learning

Posted on:2022-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhuFull Text:PDF
GTID:2518306731999469Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,people are used to using computers to process and store text information.However,due to the characteristics of convenience and quickness,handwritten text can still be seen everywhere,and a large number of handwritten text data before the popularization of computers have the needs of use,processing and preservation.Therefore,computer technology is used to recognize a large number of handwritten text,It has important application value.This thesis incorporates the idea of information gain into the sample learning phase of in-depth learning and applies it to the recognition of offline handwritten text.Experiments show that the recognition rate of the model is improved and the recognition time is shortened significantly.Finally,the research results are applied to the handwritten text recognition system.The main work of this thesis is:(1)Considering the problems in the process of image acquisition for handwritten text recognition,the handwritten text image is preprocessed by image processing techniques such as image rotation,normalization,image denoising,image grayscale,image binarization,etc.Before preprocessing,this thesis compares and experiments various preprocessing methods,and chooses various methods suitable for handwritten text image processing,including Hough transformation for text image skew correction,mean filter for image denoising,maximum gray for image,global threshold for image binarization,and so on.The improved histogram projection method divides large sections of handwritten text,and the bilinear interpolation method is used to normalize the image size.(2)Use BP network and convolution neural network algorithm to recognize handwritten text pictures,and use sample learning method based on information gain to observe the highest accuracy and running time of the model.In this thesis,three methods of information gain are proposed.The aim is to stop blindly learning all the samples after the model has a certain recognition rate,but to selectively learn the samples based on the idea of information gain.Method 1: Increase the information gain learning factor I to learn less about the samples that the neural network has been able to easily identify,and learn more about the samples that are not so high or may even recognize errors.Method 2: Increase learning interruption threshold F to reduce the learning of garbage samples.Method 3: Samples with high information gain of model are introduced as expert samples.Experiments show that both method 1 and method 2 are helpful to improve the learning efficiency and the accuracy of model recognition.With handwritten numbers as an example,the learning time of method 1and method 2 is 54.4% and 59.3% respectively,when the same level of accuracy is achieved.(3)Based on the research results of this project,a prototype system for handwritten text recognition is built on Model Arts platform in Huaweiyun.The prototype system has the functions of handwritten text recognition,manual review and so on,which has certain application value.Among them,after model identification,the purpose of manual review is to conduct artificial identification verification for samples with low confidence in model identification.Its principle is to randomly take out a large number of random samples and obtain the output s value of model softmax function respectively.It can be assumed that these s values obey normal distribution and are based on 3 ? The principle and the importance of identifying text determine the threshold of s value for manual review.
Keywords/Search Tags:Offline text recognition, Image processing, Deep learning, Information gain, Prototype system
PDF Full Text Request
Related items