Font Size: a A A

Research On Automatic Identification Of Invoice Based On OCR

Posted on:2015-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:T H WangFull Text:PDF
GTID:2348330518970629Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of office automation, printed invoice took place of handwritten invoice. At present, people have to read the serial numbers of the invoice manually and search them on the website of tax bureau to check. It is easy to make mistakes and waste a lot of time.This paper presents method of optical character recognition to identify the invoice.The modified methods of bottom-hat transform and OTSU algorithm are used to obtain the binary image of the invoice under the influence of photography and printing. The lines or rectangles can be detected by Hough transform. The image will be rectified by rotating the image with the mean of the slope angle in the opposite direction. There are many uncorrelated characters making the recognition difficult. According to the features of serial numbers, this dissertation puts forward an algorithm of segmentation of characters region based on the contour feature. It also gives out the formulae to calculate the value of self-adaption extending.Compared with morphology or machine learning, it is easy to realize and applies to most situations. This paper studies the method of cluster analysis to integrate the selected characters and gives out a formula to calculate the distance about the cluster members and the direction. We discuss the chain code method to store and process the information of contours including those of one pixel width. The false fracture or connection can be repaired by analyzing the position relation of connected domain. After taking the pixels distribution features and the structure features of crossover point and stroke, the characters will be recognized using support vector machine multi-classifier of one-to-one kind. To optimize the parameters, cross-validation is performed to calculate the accuracy of different parameter combinations.Experimental results show that the proposed method of character region segmentation and the OCR program based on SVM classifier can identify the serial numbers of different kinds of invoice. It also applies to the character recognition of other two-dimensional images.The universal property is better than the method of format matching.
Keywords/Search Tags:Automatic identification of invoices, Contour feature, Character region segmentation, Character recognition, Support vector machine
PDF Full Text Request
Related items