Font Size: a A A

Research On Bill Detection And Recognition Algorithm Based On Depth Learning

Posted on:2021-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:L X SuiFull Text:PDF
GTID:2428330605968462Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Natural scene ticket recognition is the process of text recognition,extracting key texts from receipts and invoices and save the texts to structured documents can serve many applications and services,such as high-efficiency archiving and fast indexing,document analysis and bill review,etc.With the development of deep learning,natural scene text recognition tasks(such as license plate recognition,scene text recognition)have made breakthroughs in terms of accuracy and processing speed.The recognition task requires higher accuracy than the general OCR task,and because of the variety of bills and many template frames,it is difficult to find a unified method to detect all types of bills.For low-quality bills such as bill content offset printing,seal cover text and so on,it is more difficult to identify.In order to adapt to different types of bills and effectively recognize all characters in the bills,the overall recognition process in this thesis includes tilt correction,seal removal,text segmentation and character recognition.The main work is as follows:(1)In order to extract effective identification units and reduce the impact of the bill frame on the location accuracy,the advantages and disadvantages of existing location methods such as EAST,CTPN are compared,CTPN is used to locate the text position,and the output results of the network are improved to merge the lines of text that meet the condition of vertical overlap rate.In order to segment the text with close distance but belonging to different units,the vertical projection algorithm is improved,and different segmentation strategies are selected according to the different discriminant conditions,and the text content belonging to the same unit is obtained.(2)Because there are seals on the bills,the accuracy of character recognition will be affected when the seals and characters overlap,so a seal removal algorithm based on CMY color compensation is proposed.Firstly,the pixel points closest to C,M,Y primary color in the text will be identified through analysis,and the transformation matrix of color compensation is obtained based on matrix transformation.Finally,the text content is extracted adaptively from the image after color compensation,and the seal is removed while retaining the overlapping text strokes.(3)In real life,many tickets have offset printing,that is,text is printed on the frame of the ticket,and the data in the training data set Synthetic Chinese String Dataset does not exist.when the CRNN network is used for text recognition,the training data picture is simulated for frame line generation,and the convolution neural network(CNN)is used to learn the frame line features.End-to-end character recognition of text lines.
Keywords/Search Tags:text location, text segmentation, seal removal, character recognition
PDF Full Text Request
Related items