Font Size: a A A

Research On Connected Characters Recognition For Handwritten Checks And Its Application

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2428330614471845Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Optical character recognition has been a longstanding research topic in computer vision,especially with the continuous development of deep learning in recent years,the research on scene text recognition has been pushed to the climax.Since the bank checks have been widely used,the application of character recognition technology in checks can greatly improve work efficiency.The checks are divided into two types: handwritten and printed characters.Compared with the latter,the former are characterized by irregular handwriting,large differences in characters' size and spacing,and possible interference with noise,etc.,which will increase the difficulty of recognition.In this paper,we solve the problem of the handwritten characters recognition of the checks,and adopt appropriate methods according to the characteristics of Chinese characters and digits.The main research contents and results of this paper are as follows:(1)In order to overcome the shortcoming of beam search module that the segmentation-recognition method is easily affected by incorrect characters in checks,a reliable first beam search(RFBS)algorithm based on the reliability of CNN recognition and semantic features is proposed.The RFBS algorithm improves the accuracy of handwritten company name recognition by giving priority to the search interval with higher reliability.Furthermore,according to the structural features of company names,a deducing method for their pre-suffixes is proposed to solve the problem of pre-suffixes identification effectively.Finally,the Jieba Chinese word segmentation and the positions of characters are used to detect the error characters of the recognition results,and LSTM language model is adopted to correct the error characters,in which the component similarity is combined with traditional shape similarities.The experimental results show that the accuracy of RFBS algorithm combined with the error correction method can achieve 93.08%,which is significantly better than the traditional beam search algorithm.Moreover,we further perform ablation experiments to demonstrate the effectiveness of adding component similarity on shape's similarities.(2)In view of the limitation of existing recognition methods in recognizing long handwritten digit string images in checks,an end-to-end recognition framework based on pre-segmentation is proposed.In the segmentation stage,image background is removed by using Mask dodging method,then multiple sub-images are segmented from the original RGB image according to the coordinates obtained from the connected component analysis.In the recognition stage,the model built with Res Net,Bi-LSTM and CTC,which has strong feature representation and learning ability.In addition,in order to train the endto-end recognition model,the background images and font colors extracted from the actual checks are combined with various data augmentation technologies to synthesize a large number of simulated check digit string images with different lengths.The experiment results show that the proposed method can reduce the average edit distance between recognition results and labels to 0.088,which is better than the segmentationrecognition and the end-to-end recognition methods.
Keywords/Search Tags:Handwritten Chinese text recognition, Handwritten digit string recognition, Convolutional neural network, Long short-term memory, Chinese spelling correction
PDF Full Text Request
Related items