Font Size: a A A

Research And Implementation Of Batch Invoice Identification System Based On OCR

Posted on:2020-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z F HuFull Text:PDF
GTID:2428330596495022Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The VAT invoice is an accounting document that records the trading activity.It is the basis for the company to make the account,and it is also the expense certificate for the tax payment.Therefore,the company usually needs the financial department to manage the invoice.However,the invoice management work is cumbersome and tedious.The workload is large and the information is recorded.The human record requires a lot of manpower and material resources.With the development of technology,OCR technology has been applied to various fields.The use of OCR technology to automatically identify and record invoice information can effectively improve the efficiency of financial personnel.This paper analyzes the invoice layout to determine the task of the invoice identification system,and then studies the OCR system modules according to the mission requirements,and designs an OCR-based batch invoice identification system.Firstly,the scanner is used to collect the invoice image.The scanner collects images with clear effect and can complete multiple invoice scanning work,which is in line with the requirements of this system.Perform image preprocessing o n the invoice image,including normalized size,determine whether the invoice needs to be flipped,binarized,and morphologically processed.Then the text localization algorithm is explored.Since the information is printed on the invoice by the printer,there is an offset when the reason information is placed,and the invoice page often has interference,which is not conducive to template matching positioning and connectivity domain algorithm positioning.This paper designs a multi-layer self-encoding + SVM positioning algorithm,which can effectively solve the above problems and successfully complete the information text positioning.A single character cut is then accomplished using an improved projection cut.For character recognition,since the information recognized by the system is digitally printed characters,in order to improve the recognition rate,the cut-out characters are used for the training of the classifier.The BP,HOG+SVM and CNN classification algorithms are used to study,and the accuracy,efficiency and antiinterference ability of the recognition are comprehensively selected.Finally,the HOG+SVM algorithm is selected as the recognition algorithm of the system.Finally,this paper integrates the research results of the above modules,use s C++ to program and uses the MFC design system interface to complete the invoice identification system.The performance of the system is tested.The results show that the system is simple in operation and high in recognition rate,which can meet the working needs of financial personnel.
Keywords/Search Tags:OCR, text localization, character recognition, VAT invoice
PDF Full Text Request
Related items