Font Size: a A A

Research On Automatic Identification Method Of Invoice Content Based On Probability Graph Model

Posted on:2017-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2348330518970783Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
During the process of financial reimbursement,accounting or related person needs to inspect the invoices and audits the contents.During the accounting and other operations.The manual operation is simple,repetitive and ineffective.In order to improve the computerized accounting and office automation level.A method of automatic identification of invoice content is studied in this thesis.Automatic extraction and identification of the relevant contents of the invoice is expected according to the accounting work.The relevant information is reffered,the research status of the invoice identification is analyzed and summaried in this thesis.The necessary steps of automatic identification of invoice contents is studied and discussed,the location of the invoice information,the extraction of the invoice information,and the automatic identification of the invoice is included in this research.For the image preprocessing,the color,form and position is used as the evidence of location.In this thesis,the method of eliminating the interference of various tilt and noise in image acquisition is introduced.For the location of the invoice information,due to difference of color between the fixed information and the machine information,Therefore,a new method based on RGB feature and prior knowledge is used to locate the invoice information,and a method to automatically identify the invoice information is presented.For the extraction of the attribute of the invoice information,the characters of the recognition of the invoice,the location feature,the number of related characters is extracted to broad thinking.For the automatic identification of invoice,a new method based on probabilistic graph model is proposed which is contriuted to advance the process of accounting computerization.In this thesis,the Jingdong invoice is used to process exeriments.The semantic classification and recognition of the three kinds of invoices are carried out.Experimental data were manually collected 110 Jingdong invoices,70 taxi unified invoices,62 Dangdang online shopping invoices,a total of 2864 invoices of fixed information is classified and recognized;the special situation is accorded,40 incomplete taxi invoice,40 defaced Dangdang invoice 60 fold Jingdong invoice is collected,a total of 1492 fixed information classified and recognized.Finally,the identification and correction of the identification results are carried out by the tool named tesseract-OCR,and the recognition rate is improved.Compared with conventional methods,this method can be used to identify the contents of the vertical version of the invoice.
Keywords/Search Tags:automatic identification of invoice, semantic recognition, probabilistic graph model, Bayesian network
PDF Full Text Request
Related items