Font Size: a A A

Research On Chinese Character Recognition Method Based On Deep Learning

Posted on:2022-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:F L RenFull Text:PDF
GTID:2518306497972459Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,text detection and recognition technology in natural scenes has penetrated into all aspects of work and life,and has important applications in the fields of photo translation,driverless,express bill unattended,bill recognition and so on.From the early scanning document recognition to character recognition in natural scene images,the application of OCR is more and more widely,and the corresponding scene character recognition problem needs to be solved,which has been paid close attention by the academic community.At present,academic OCR can support multiple languages and has a certain universality,but the recognition accuracy of Chinese characters,especially those close to form,is not ideal.The two key steps in OCR are text detection and text recognition.In order to improve the effect of Chinese text recognition in natural scenes,this paper studies text detection and text recognition.On the one hand,for the text detection algorithm,after studying and comparing several popular text detection algorithms,this paper selects the efficient and accurate East algorithm as the basic algorithm in the text detection stage,and aiming at the defect of insufficient recognition effect on the long text,it makes improvements from the following three aspects:(1)change the convolutional neural network in the feature extraction stage Structure,improve the detection accuracy.(2)After the feature fusion stage,the Bi LSTM network is added to expand the receptive field of the network by obtaining the position information of adjacent pixels.(3)The calculation method of output vertex coordinates is improved,from average weighting according to the distance of all pixels to average weighting according to the distance of head and tail pixels,which makes the detection of text box boundary range more accurate.In this paper,through the international authority competition ICDAR dataset training and testing comparative experiments,finally proved that the improved East algorithm in text detection accuracy and recall rate are improved.On the other hand,based on the traditional CNN + RNN + CTC algorithm model,the paper proposes a similar CRNN algorithm based on the structural differences of similar words and semantic information of context.First of all,according to the similarity algorithm of Chinese characters,the paper constructs a Chinese character database with similar shape,and carry out enhancement training according to the feature differences of similar Chinese characters so as to improve the recognition accuracy of similar Chinese characters from the structure of Chinese characters.After getting the preliminary results,a semantic detector is added to correct the semantic independent error recognition results through three stages: error detection after Chinese word segmentation,candidate recall and error correction sorting,so as to further improve the recognition accuracy of similar Chinese characters at the semantic level.A complete OCR model requires both precision and speed,and can be deployed at multiple terminals to further expand the application range and realize the research landing.In this paper,the basic algorithm of text detection algorithm East and text recognition algorithm CRNN are relatively convenient in deployment,the model size is appropriate,and the accuracy has been further improved.It can be used as a general model in various fields,and has a broad application scenario.
Keywords/Search Tags:OCR, Text Detection, Text Recognition, Scene Chinese Characters, Similar Chinese Character
PDF Full Text Request
Related items